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AN APPROACH TO TIME SERIES ANALYSIS! 
By EMANUEL PARZEN 
Stanford University 


Summary. It may fairly be said that modern time series analysis is a subject 
which embraces three fields which while closely related have tended to develop 
somewhat independently. These fields are (i) statistical communication and 
control theory, (ii) the probabilistic (and Hilbert space) theory of stochastic 
processes possessing finite second moments, and (iii) the statistical theory of 
regression analysis, correlation analysis, and spectral (or harmonic) analysis of 
time series. In this paper it is my aim to show the close relation between these 
fields and to summarize some recent developments. 

The topics discussed are (i) stationary time series and their statistical analysis, 
(ii) prediction theory and the Hilbert space spanned by a time series, and (iii) 
regression analysis of time series with known covariance function. In particular, 
I describe a new approach to prediction and regression problems using reproduc- 
ing kernel Hilbert spaces. 


1. Introduction. A set of observations arranged chronologically is called a 
time series. Time series are observed in connection with quite diverse phenom- 
ena, and by a wide variety of researchers, such as (1) the economist observing 
yearly wheat prices, (2) the geneticist observing daily egg production of a cer- 
tain breed of hen, (3) the meteorologist studying daily rainfall in a given city, 
(4) the physicist studying the ambient noise level at a given point in the ocean, 
(5) the aerodynamicist studying atmospheric turbulence gust velocities, (6) the 
electronic engineer studying the internal noise of a radio receiver, and so on. 

Time series analysis constitutes one of the most important tools of the econo- 
mist. Consider the prices or quantities of commodities traded on an exchange. 
The record of prices or quantities over time may be represented as a fluctuating 
function (or wiggly record). The analysis of such economic time series is a prob- 
lem of great interest to economists desiring to explain the dynamics of economic 
systems and to speculators desiring to forecast prices. 

Techniques of time series analysis have long been used in science and engineer- 
ing (for example, to smooth data and to search for “periodicities” [6]). The 
theory and practice of time series analysis is assuming new importance in the 
space age since a wide variety of problems involving communication and/or 
control (involving such diverse problems as the automatic tracking of moving 


Received September 27, 1960. 

1 An address presented on August 25, 1960 at the Stanford meetings of the Institute of 
Mathematical Statistics by invitation of the IMS Committee on Special Invited Papers. 
This paper was prepared with the partial support ot the Office of Naval Research (Nonr- 


225-21). Reproduction in whole or part is permitted for any purpose of the United States 
Government. 


951 





952 EMANUEL PARZEN 


objects, the reception of radio signals in the presence of natural and artificial 
disturbances, the reproduction of sound and images, the design of guidance sys- 
tems, the design of control systems for industrial processes, and the analysis of 
any kind of record representing observation over time) may be regarded as 
problems in time series analysis. 

To represent a time series, one proceeds as follows. The set of time points at 
which measurements are made is called 7’. The observation made at time ¢ is 
denoted by X(t). The set of observations { X(t), t ¢ T} is called a time series. 

In regard to the index set 7, there are cases of particular importance. One 
may be observing (i) a discrete parameter time series X(t), in which case one 
assumes T' is a finite set of points written JT = {1, 2, --- , N}, (ii) a continuous 
parameter time series, in which case T is a finite interval written T = 
{t:0 < ¢t S L}, (iii) a multiple (discrete or continuous parameter) time series 
{(X,(t), --- , X,(t)), te T’} which may be written as a time series {(X(t), 
te T} with index set T = {(j,t):7 = 1,---,k and te T’}, or (iv) a space 
field X (x, y, z, t) defined on space-time which is a function of three coordinates 
of position and one coordinate of time. 

The basic idea of the statistical theory of analysis of a time series { X(t), t e T} 
is to regard the time series as being an observation made on a family of random 
variables { X(t), t ¢ T}; that is, for each ¢ in T, X(t) is an observed value of a 
random variable. A family of random variables { X(t), t ¢ T} is called a stochastic 
process. An observed time series | X(t), t ¢ T} is thus regarded as an observation 
(or, in a different terminology, a realization) of a stochastic process | X(t), te T}. 

It has been pointed out by various writers (see, for example, Neyman [34]) 
that there are two broad categories of statistical problems: problems of sto- 
chastic model building for natural phenomena and problems of statistical de- 
cision making. These two categories of problems are well illustrated in the analy- 
sis of economic time series; some study time series in order to understand the 


mechanism of the economic system while others study time series with the simple 
aim of being able to forecast, for example, stock market prices. In general, it 
may be said that the aims of time series analysis are 


(1) to understand the mechanism generating the time series, 

(2) to predict the behavior of the time series in the future. To attack either 
of these problems, one adopts a model for the time series. 

A model often adopted for the analysis of an observed time series {| X(t), te T} 
is to regard X(t) as the sum of two functions: 


(1.1) X(t) = m(t) + Y(t), Sez. 


We call m(t) the mean value function and Y(t) the fluctuation function. 
The stochastic process Y(t) is assumed to possess finite second moments, and 
to have zero means and covariance kernel 


(1.2) K(s, t) = E[Y(s)Y(t)]. 


In addition it is often assumed that Y(t) is a normal process in the sense that 
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for every finite subset {t; , --- , tn} of 7, the random variables Y(t,), --- , Y(t.) 
are jointly normally distributed. 

The mean value function 


(1.3) m(t) = E[X(t)] 


is assumed to belong to a known class M of functions. For example, M may be 
the set of all linear combinations of g known functions w,(t), --- , w,(t); then, 
for tin T, 


(1.4) m(t) = Byw,(t) + --- + B,w,(t), 


for some coefficients 8,, --- , 8, to be estimated. Other possible assumptions 
often made concerning the mean value function m(t) are as follows: (i) m(t) 
represents a systematic oscillation, 


q 
(1.5) mt) = >. A; cos (wit + ¢;) 
=1 
in which the amplitudes A; , the angular frequencies w; , and the phases ¢; are 
constants, some of which are given and the rest are unknown and are to be esti- 
mated; (ii) m(t) represents a polynomial trend, 


q-1 


(1.6) m(t) = >, BH’, 


j=0 


an assumption often adopted if m(t) represents the trajectory [given by m(t) = 


xo + vt + at’, say] of a moving object, or (iii) m(t) is the sum of a systematic 
oscillation and a polynomial trend, an assumption traditionally adopted in 
treating economic time series. 


Early workers in time series analysis sought to explain the dependence be- 
tween successive observations of a time series X(t) by assuming that X(t) 
[sometimes written X,| was generated by a scheme of the following kind: 


(1.7) X,= m(t) + Y, 


where m(t) represents a systematic oscillation of the form of (1.5) and the 
fluctuations Y, , --- , Y, are assumed to be independent, normal random vari- 
ables with mean 0 and common unknown variance o’. 

The model given by (1.7) is called the scheme of hidden periodicities and was 
first introduced by Schuster ([47], [48]). The method used to estimate the fre- 
quencies w; (or, equivalently, the periods 27/w;) is called periodogram analysis. 
The problem of tests of significance in periodogram analysis ({11], [18]) played 
an important role in the early history of time series analysis. 

Approaches to time series analysis which seem to be more fruitful than periodo- 
gram analysis (see Kendall [22]) were pioneered by Yule and Slutsky in the 
1920’s. 

Yule’s researches [60] led to the notion of the autoregressive scheme, in which 
a time series X, is assumed to be generated as a linear function of its past values, 
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plus a random shock; in symbols, for some integer m (called the order of the 
autoregressive scheme) and constants a; , --* , Gm 


Xe = Xe + +s + aOnXem + m 


in which the sequence { 7;} consists of independent identically distributed random 
variables. In particular, Yule showed that an autoregressive scheme of order 2 
provided a better model for sunspots than did the scheme of hidden periodicities. 

Slutsky’s researches [51] led to the notion of a moving average scheme, in which 
a time series X, is assumed to be generated as a finite moving average of a se- 
quence of independent and identically distributed random variables {7}; in 
symbols, for some integer m and constants do , --- , Om 


X; = don: + Q\Nt-1 tec + AmNt—m - 


Slutsky showed that moving averages exhibit properties of disturbed periodicity 
and consequently can be used as a model for oscillatory time series. In particu- 
lar, Slutsky proved the Sinusoidal Limit Theorem which showed that a sine 
wave could be approximated by a moving average scheme. 


In the 1930’s and 1940’s, the probabilistic theory of stationary time series 
was developed, first as a result of the development of ergodic theory and then 
as a result of prediction theory. That the autoregressive and moving average 
schemes may be interpreted as special cases of the theory of stationary processes 
was pointed out by Wold [58] in 1938 (see [57], p. 169). Thus the link was estab- 
lished between the statistical theory of xnalysis of time series and the probabilis- 


tic theory of the structure of time series. In the last twenty years, an extensive 
literature has developed exploring this link. 

It may fairly be said that modern time series analysis is a subject which em- 
braces three fields which while closely related have tended to develop somewhat 
independently. These fields are (i) statistical communication and control theory 
({26], [32]) (ii) the probabilistic (and Hilbert space) theory of stochastic processes 
possessing finite second moments ([9], Chaps. 9-12; [28], Chap. 10), and (iii) 
the statistical theory of regression analysis, correlation analysis, and spectral 
(or harmonic) analysis of time series ({13], [17], [23], [58]). In this paper it is my 
aim to show the close relation between these fields and to summarize some re- 
cent developments with which I have been closely associated. The contents of 
the paper are as follows. 

(1) Stationary time series and their statistical analysis. While it is a fiction to 
regard an observed time series as having zero means, it is mathematically con- 
venient to consider the analysis of time series under this assumption. Conse- 
quently, one may consider the analysis of an observed time series {| X(t), t ¢ T} 
with vanishing mean value function and unknown covariance function. 

It has long been traditional among physical scientists to regard time series as 
arising from a superposition of sinusoidal waves of various amplitudes, fre- 
quencies, and phases. In the theory of time series analysis and statistical com- 
munications theory, a central role is played by the notion of the spectrum of a 
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time series. For the time series analyst, the spectrum represents a basic tool for 
determining the mechanism generating an observed time series. For the com- 
munication theorist, the spectrum provides the major concept in terms of which 
to analyze the effect of passing stochastic processes (representing either signal 
or noise) through linear (and, to some extent, non-linear) devices. Spectral (or 
harmonic) analysis is concerned with the theory of the decomposition of a time 
series into sinusoidal components. For many time functions {X(t), —° < 
t < }, such a decomposition is provided by the Fourier transform 


S(w) =| e X(t) dt. 


Unfortunately, no meaning can be attached to this integral for many stochastic 
processes {X(t), —2 < t < } since their sample functions are nonperiodic 
undamped functions and therefore do not belong to the class of functions dealt 
within the usual theories of Fourier series and Fourier integrals. Nevertheless, it 
is possible to define a notion of harmonic analysis of stochastic processes (that is, 
a method of assigning to each frequency w a measure of its contribution to the 
“content” of the process) as was first shown by N. Wiener [56] and A. Khint- 
chine [24]. Among the stochastic processes which possess a harmonic analysis, 
stationary processes are most important since a time series may be represented 
as a superposition of sinusoidal waveforms with ‘independent amplitudes”’ if 
and only if it is stationary (see Section 4 for a more precise form of this asser- 
tion). In Section 2, some basic results concerning stationary processes are sum- 
marized. 

Much of the recent statistical literature on time series analysis has been con- 
cerned with questions of statistical inference on stationary time series and es- 
pecially with 

(i) deriving the exact and asymptotic distributions of various estimates of 
the covariance functions R(v) and the normalized covariance (or correlation) 
function p(v) = R(v)/R(0) of a stationary time series, 

(ii) fitting stationary time series by mechanisms (such as autoregressive 
schemes or moving average schemes) which are completely specified except for 
a finite number of parameters, and with estimating the parameters of such 
schemes, 

(iii) estimating (and forming confidence sets) for the spectral density function 
and spectral distribution function of a stationary time series. 

For many purposes, it is preferable to estimate the spectrum of a stationary 
time series rather than its correlation function, since many aspects of a stationary 
time series are best understood in terms of its spectrum. The spectrum enables 
one to (i) investigate the physical mechanism generating a time series, (ii) de- 
termine the behavior of a dynamic linear system in response to random excita- 
tions, and (iii) possibly simulate a time series. Other uses of the spectrum are as 
operational means (i) of transmitting or detecting signals, (ii) of classifying rec- 
ords of phenomena such as brain waves, (iii) of studying radio propagation 
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phenomena, and (iv) of determining characteristics of control systems. The 
theory of statistical spectral analysis is too extensive to be reviewed here. For 
surveys of this theory, see Bartlett [3], Hannan [17], Blackman and Tukey [4], 
Jenkins [20], Parzen [41], Rosenblatt [45], and Tukey [53]. 

A comprehensive survey of the results available on topics (i) and (ii) has been 
given recently by Hannan [17]. Other comprehensive reviews are given by 
Bartlett [3], Moran [33], and Wold [57], Chap. 11; in these reviews one may 
find references to the work of M. 8. Bartlett, H. E. Daniels, J. Durbin, E. J. 
Hannan, G. M. Jenkins, M. H. Quenouille, A. M. Walker, G. 8S. Watson, and 
P. Whittle. In Section 8, I have attempted to give an introduction to some of 
the large sample results available on topics (i) and (ii). 

(Il) Prediction theory and the Hilbert space spanned by a time series. A basic 
problem in time series analysis is that of minimum mean square error linear pre- 
diction. Let Z be an unobserved random variable with finite second moment. 
Let {X(t), te T} be an observed time series. One seeks that random variable, 
linear in the observations, whose mean square distance from Z is smallest. In 
other words, if one desires to predict the value of Z, on the basis of having ob- 
served the values of the time series { X(t), ¢ ¢ T}, one method might be to take 
that linear functional in the observations, denoted by E*[Z | X(t), t « T|, whose 
mean square error as a predictor is least. (The symbol E* is used to denote a 
predictor because in the case of jointly normally distributed random variables, 
the best linear predictor E*{Z | X(t), te T| coincides with the conditional ex- 
pectation E|Z | X(t), t e T|; for an elementary discussion of this fact, see Parzen 
({38], p. 387). Indeed, it should be noted that in any event the conditional 


expectation E[Z | X(t), t e T| can be defined as the minimum mean square error 
non-linear predictor. ) 


The prediction problem has provided a framework in terms of which many 
problems of statistical communication theory have come to be formulated. The 
pioneering work on prediction theory was done by Wiener [56a] and Kolmogorov 
[25] who were concerned with a stationary time series which had been observed 
over a semi-infinite interval of time. They sought predictors which had mini- 
mum mean square over all possible linear predictors. Wiener showed how the 
solution of the prediction problem could be reduced to the solution of the so- 
called Wiener-Hopf integral equation, and gave a method (spectral factorization) 
for the solution of the integral equation. Simplified methods of solution of this 
equation in the practically important special case of rational spectral density 
functions were given by Zadeh and Ragazzini [61] and Bode and Shannon [5]. 
Zadeh and Ragazzini [62] also treated the problem of regression analysis of 
time series with stationary fluctuation function, by reducing the problem to one 
involving the solution of a Wiener-Hopf equation. There then developed an ex- 
tensive literature, seeking to treat prediction and smoothing problems involving 
a finite time of observation and non-stationary time series. The methods em- 
ployed were either to reduce the problem to the solution of a suitable integral 
equation (generalization of the Wiener-Hopf equation) or to employ expansions 
(in a series of suitable eigen functions) of the time series involved. 
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As a result of these developments, prediction theory has turned out to pro- 
vide a theory of the structure of time series and to provide mathematical tools 
for the solutions of other problems besides the prediction problem, especially 
regression problems. In Section 3, it is shown how the prediction problem leads 
naturally to the introduction of the important notion of the Hilbert space spanned 
by a time series which plays a central role in modern time series analysis. In 
Sections 4 and 6, I describe an approach to prediction and regression problems 
(in terms of reproducing kernel Hilbert spaces) which may be called coordinate 
free, and which by the introduction of suitable coordinate systems contains 
previous approaches as special cases. 

The approach I take seems to me to be a rigorous version of an approach that 
is being developed in the Soviet Union by V. 8. Pugachev [44]. Pugachev has in 
recent years advanced a point of view, which he calls the method of canonic 
representations of random functions, for which in one of his articles [43] he 
makes the following claim. ‘““The results of this article, together with the results 
of [previous] papers, permits us to state that the method of canonic representa- 
tions of random functions is the foundation of the modern statistical theory of 
optimum systems.” It is my feeling that reproducing kernel Hilbert spaces 
provide a more powerful and more elegant means of achieving in a unified man- 
ner the results which Pugachev has sought to unify by the method of canonic 
representations. 


(III) Regression analysis of time series with known covariance function: Let the 
observed time series be of the form of (1.1) with unknown mean value function 
m(t) and known covariance function K(s, t). Various methods of forming esti- 


mates of m(t) are available. The most important methods are classical least 
squares estimation and minimum variance linear unbiased estimation. In the 
case of normally distributed observations, one has in addition the methods of 
maximum likelihood estimation and minimum variance unbiased estimation. 
In Sections 6 and 7, it is shown how Hilbert space techniques may be used to 
form explicit expressions for these estimates in terms of certain so-called re- 
producing kernel inner products. 

There are, of course, large numbers of important problem areas of time series 
analysis which have not been mentioned in the foregoing such as (i) the problem 
of the distribution of zero-crossings and extrema of a time series (see references, 
see Longuet-Higgins [30] and Slepian [50]), (ii) the problem of the asymptotic 
efficiency of various classes of estimates of regression coefficients (see Grenander 
and Rosenblatt [13], Hebbe [19], and Striebel [52]), (iii) the use of filters to 
eliminate or extract trend or other components of a time series, and (iv) the 
distribution of various functionals of a time series, such as quadratic forms. 

Further, the statistical analysis of multiple time series is not discussed. The 
relations that exist between different time series is on the whole a problem of 
greater interest than the relations that exist within a single time series. The 
results which exist under categories (I), (II), and (III) for univariate time series 
can be formally extended to multiple time series. However, many new problems 
arise which have not been thoroughly investigated. 





958 EMANUEL PARZEN 


A word should be said about the references given at the end of this paper. I 
have given a representative list rather than a complete list. Fortunately, a 
complete list of references will soon be available. The International Statistical 
Institute is compiling a bibliography on Time Series and Stochastic Processes 
which is to list and classify books and papers published, in the years 1900-1959, 
on both theory and applications. A bibliography (Parzen [42a]) of American 
publications has been compiled at Stanford for inclusion in the LS.I. bibliog- 
raphy; a limited number of copies of this bibliography are available, and may 
be obtained by writing to the author. A bibliography is also given by Deming [8]. 


2. Stationary time series. A discrete parameter time series 
(X(t), t = 0, +1, --+} 


or a continuous parameter time series { X(t), — 2° <t< &} is said to be (weakly 
or wide-sense ) stationary if the product moment 


E|X(s)X(s + t)] = R(t) 


is a function only of t. One calls R(-) the covariance function of the stationary 
time series. 

It was shown by Khintchine [24] in 1933 that in the continuous parameter 
case there exists a non-decreasing bounded function F(w), defined for —« < 
w < «, such that 


(2.1) R(t) = | e"* dF(w), —-x <t< ow, 


if it is assumed that R(-) is continuous at t = 0. Wold [58] in 1938 showed 
that in the discrete parameter case there exists a non-decreasing bounded func- 
tion F(w), defined for —r < w S 7, such that 


(2.2) R(t) = | e* dF(w), t=0,+1,---. 


The function F(w) is called the spectral distribution function of the time series. 
Like a probability distribution function, F(w) can be uniquely written as the 
sum, 


F(w) = Falw) + Fie(w) + Fae(w), 


of three distribution functions with the following properties. The function 
F,.(w) is absolutely continuous and is the integral of a non-negative function 
f(w) called the spectral density function of the time series. The function Fy(w) 
is a purely discontinuous (or discrete or step) function: 


where {w,} are the discontinuity points of F(w), and AF(w) = F(w+0) — 
F(w — 0). Finally, F,.(w) is a singular continuous function. 
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It is usually assumed that physically observed time series have a spectral 
distribution function of the following form: 


F(w) = > AF (w ) + | flw ) dw 


w’ such 
that AF(w’) > 0 
and w’ Sw 

where (i) the spectral density function f(w) has the property that it is an in- 
tegrable non-negative function which is continuous except at a finite number of 
points where it has finite left-hand and right-hand limits, and (ii) the set of 
frequencies at which the spectral jump function (or spectral mass function) 
AF(w) is positive contains at most countably infinite many points distributed 
on the real line in such a way that in any finite interval there are only a finite 
number of points of positive spectral mass. If these conditions are satisfied, we 
say that the time series has a mixed spectrum. If the spectral density function 
vanishes for all w, we say that the time series has a discrete spectrum. If the 
spectral jump function AF(w) vanishes for all w, we say that the time series has 
a continuous spectrum. 

In terms of the spectral distribution function one can characterize various 
representations (or models) for a stationary time series X(t). For example, it 
may be shown that a discrete parameter time series with a mixed spectrum whose 
spectral density function satisfies the condition 


log f(w) dw > — 


—_—_ 


may be written 


(2.3) X(t) = Do Ave + Doemn(t — v) 


v=0 


for suitable sequences of frequencies {w,}, constants {c,}, and uncorrelated ran- 
dom variables {A,} and {7,}. In view of (2.3) one sees that the scheme of hidden 
periodicities and the scheme of moving averages may be viewed as a special 
kind of stationary process. Similarly, it may be shown that an autoregressive 
scheme (where the 7, are uncorrelated rather than independent) corresponds 
to a stationary time series whose distribution function is absolutely continuous 
and whose spectral density function is of the form 


| ™ j27-—1 
flw) = E > b; a | 
| k= 


for suitable constants bo , --- , b» . To prove these assertions, one uses the Hil- 
bert space representation theory described in Section 4. 


3. The problem of minimum mean square error linear prediction. In order to 
show existence and uniqueness, and to obtain conditions characterizing, the best 
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linear predictor, we need to introduce the notion of a Hilbert space. (For a dis- 
cussion of Hilbert space theory see any suitable text, such as Halmos [16].) 

DertniTion 3A. By an abstract Hilbert space is meant a set H whose mem- 
bers u, v, --+ are usually called vectors or points which possesses the following 
properties. 

(1) H is a linear space [that is, for any vectors u and v in H, and real number 
a, there exist vectors, denoted by u + v and au respectively, which satisfy the 
usual algebraic properties of addition and multiplication; also there exists a 
zero vector 0 with the usual properties under addition}. 

(II) H is an inner product space [that is, to every pair of points u and v in 
H there corresponds a real number, written (u, v) and called the inner product 
of u and v, possessing the following properties: for all points u, v, and w in H, 
and every real number a, 


(1) (au, v) = alu, v) 

(ii) (uw + v, w) = (u, w) + (0, w) 

(ili) (v, w) = (4, v) 

(iv) (u, wu) > Oif and only if u ¥ Oj}. 

(IIl) H is a complete metric space under the norm |\u\| = (u, w)’ [that is, 
if {u,} is a sequence of points such that ||u, — u,|| ~ 0 as m,n — © then there 
is a vector u in H such that |/u, — ul!?> -~Oasn— « |. 


In order to define the notion of the Hilbert space spanned by a time series, 
we first define the notion of the Hilbert space spanned by a family of vectors. 

DEFINITION 3B. Let T be an index set, and let {u(t), t ¢ T} be a family of 
members of a Hilbert space H. The linear manifold spanned by the family 
{u(t), te T}, denoted L(u(t), t « T), is defined to be the set, consisting of all 
vectors u in H which may be represented in the form u = >." c,u(t;) for some 
integer n, some constants ¢;, --- ,¢,, and some points t,,--- ,t, in T. The 
Hilbert space spanned by the family {u(t), t e T}, denoted V(u(t), te T) [or 
L.(u(t), te T) if H is the space of square integrable functions on some measure 
space], is defined to be the set of vectors which either belong to the linear mani- 
fold L(u(t), te T) or may be represented as a limit of vectors in L(u(t), te T). 
If V(u(t), te T) coincides with H, we say that {u(t), te T} spans H. 

Derinition 3C. The Hilbert space spanned by a time series {| X(t), t ¢ T}, 
denoted by L.( X(t), te 7), is defined to consist of all random variables U which 
are either finite linear combinations of the random variables | X(t), t « T} or 
are limits of such finite linear combinations in the norm corresponding to the 
inner product defined on the space ZL, of square integrable random variables by 


(3.1) (U,V) = E[UV}. 


In words, L2( X(t), t ¢ T) consists of all linear functionals in the time series. 
We next state without proof the projection theorem for an abstract Hilbert 
space. 
ProsecTION THEoreM. Let H be an abstract Hilbert space, let M be a Hilbert 
subspace of H, let v be a vector in H, and let v* be a vector in M. A necessary and 
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sufficient condition that v* is the unique vector in M satisfying 
(3.2) v* — vil = min, in w ||u — V 


is that 


(3.3) (v*,u) =(v, u) for every uin M. 
The vector v* satisfying (3.2) is called the projection of v onto M, and will here be 
written E*{v | M). 

In the case that M is the Hilbert space spanned by a family of vectors 
ja(t), te T} in H, we write E*[v | x(t), te T| to denote the projection of v onto 
M. In this case, a necessary and sufficient condition that v* satisfy (3.3) is that 
(3.4) (v*, x(s)) = (v, x(s)) for every s in T. 

We are now in a position to solve the problem of obtaining an explicit expres- 
sion for the minimum mean square error linear prediction E*{Z | X(t), t e 7). 
From (3.4), with H equal to the Hilbert space L» of all square integrable random 


variables, and v = Z, it follows that the optimum linear predictor is the unique 
random variable in L2( X(t), te 7) satisfying, for all s in T, 


(3.5) E\E*|Z | X(t), te T|\X(s)] = E|ZX(s)}. 
Equation (3.5) may look more familiar if we consider the special case of an 
interval T = {t:a < t S b}. If one writes heuristically 
b 
(3.6) [ X(t)w(t) dt 
to represent a random variable in L.(X(t), te T), then (3.5) states that the 


weighting function w*(t) of the best linear predictor 


b 
(3.7) E*(Z | X(t), te T) = [ w*(t) X(t) dt, 


“a 


must satisfy the generalized Wiener-Hopf equation 


b 
(3.8) / w*(t)K(s,t) dt = pz(s), 


where we define 
(3.9) K(s, t) E|X(s)X(t)] 
(3.10) pz(t) = E[ZX(t)]. 

There is an extensive literature concerning the solution of the integral equa- 
tion in (3.8); see [39] for references. In my opinion, however, this literature is 
concerned with an unnecessarily hard problem, as well as one in which the very 


formulation of the problem makes it difficult to be rigorous. The integral equa- 
tion in (3.8) possesses a solution only if one interprets w*(t) as a generalized 
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function which includes terms which are Dirac delta functions and derivatives 
of delta functions. 

It seems to me that a simple reinterpretation of (3.8) avoids all these difficul- 
ties. Let us not regard (3.8) as an integral equation for the weighting function 
w*(t). Rather, let us compare (3.7) and (3.8). These equations say that if one 
can find a representation for the function pz(s) in terms of linear operations on the 
functions {K(s,t),teT}, then the minimum mean square error linear predictor 
E*(Z | X(t), te T] can be written in terms of the corresponding linear operations on 
the time series {X(t), te T}. It should be emphasized that the most important 
linear operations are integration and differentiation. Consequently, the problem 
of finding the best linear predictor is not one of solving an integral equation, but 
is one of hunting for a linear representation of pz(t) in terms of the covariance 
kernel K(s,¢). A general method of finding such representations will be dis- 
cussed in Sections 4 and 5. 

We illustrate the ideas involved by considering a simple example. 

EXAMPLE 3A. Consider a stationary time series X(t), with covariance kernel 


K(s, t) = Ce?" 


which one has observed over a finite interval of time, a 
one desires to predict X(b + c), for c > 0. Now, for a 


(3.12) p(t) = E[X(t)X(b + c)] = Cet? = &? 


= 
s 


In view of (3.12), by the interpretation of (3.7) and (3.8) just stated, it fol- 
lows that 


E*(X(b + c)| X(t),a St S b] = & *X(b). 


4. Hilbert space representations of time series. In the decade of the 1940’s, 
probabilists began to employ Hilbert space methods to clarify the structure of 
time series (see [21] and [27]). Among the fundamental theorems proved in this 
period were the spectral representation theorem for stationary time series, and 
the Karhunen-Loéve representation for random functions of second order on a 
finite interval. Various workers (especially Grenander [12]) have made use of 
these representation theorems in treating problems of statistical inference on 
time series. A representation theorem which does not seem to have found any 
application is one due to Loéve ((27], p. 338) which shows that there is a very 
intimate connection between time series (random functions of second order) and 
reproducing kernel Hilbert spaces. It turns out, in my opinion, that reproducing 
kernel Hilbert spaces are the natural setting in which to solve problems of sta- 
tistical inference on time series. In this section we define the notion of a Hilbert 
space representation of a time series and show how this notion may be used to 
explicitly solve the prediction problem. 

The definition we give of the notion of a Hilbert space representation of a 
time series is based on the following theorem (for proof, see Parzen [37] or [40}). 

Basic CoNGRUENCE THEOREM. Let H, and Hz be two abstract Hilbert spaces. 
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Denote the inner product between two vectors u, and us, in Hy by (u , ue); . Simi- 
larly, denote the inner product between two vectors v, and v2 in He by (0,, v2)2. 
Let T be an index set. Let \u(t), te T} be a family of vectors which span H, . Simi- 
larly, let {v(t), te T} be a family of vectors which span Hz . Suppose that, for every 
sand tin T, 


(4.1) (u(s), u(t))y = (v(s), v(t) )o. 


Then there exists a congruence (a one-one inner product preserving linear mapping ) 

v from H, onto Hz which has the property that 

(4.2) y(u(t)) = v(t), tin T. 
DeriniTion 4A. A family of vectors {f(t), te T} in a Hilbert space H is said to 

be a representation of a time series { X(t), te T} if, for every s and ¢ in T, 


(4.3) (f(s), f(t))» = K(s, t) = E[X(s)X(t)]. 


Then there is a congruence (a one-one inner product preserving linear mapping) 
y from V(f(t),t eT) onto L.( X(t), te T) satisfying 


(4.4) V(f(t)) = X(t) 
and every random variable U in L.( X(t), t ¢ T) may be written 


(4.5) U = (9g) 


for some unique vector g in V(f(t), te T). 

We next show that the representation of a time series as a stochastic integral 
is best viewed as a Hilbert space representation. 

DEFINITION 4B. We call (Q, B, u) a measure space if Q is a set, B isa o-field 
of subsets of Q, and yu is a measure on the measurable space (Q, B). We denote 
by L2(Q, B, u) the Hilbert space of all B-measurable real valued functions de- 
fined on Q satisfying 


(4.6) (f,f),. = [ Pau < o@, 
Q 


DEFINITION 4C. Let (Q, B, u) be a measure space, and, for every B in B, 
let Z(B) be a random variable. The family of random variables {Z(B), B e B} 
is called an orthogonal random set function with covariance kernel yu if, for any 
two sets B,; and B, in B, 


(4.7) E|Z(B;)Z(Be)| = u(B,Be), 


where, as usual, B,B, denotes the intersection of B, and B; . 

The Hilbert space L.(Z(B), Be B) of random variables spanned by an or- 
thogonal random set function may be defined, as was the Hilbert space spanned 
by a time series, to be the smallest Hilbert subspace of the Hilbert space of all 
square integrable random variables containing all random variables U of the form 
U = > ia c,Z(B;) for some integer n, subfamily {B, , --- , B,} C B, and real 





964 EMANUEL PARZEN 


constants ¢;,--- ,¢, . On the other hand, 1.(Q, B, «) may be described as the 
Hilbert space spanned under the norm (4.6) by the family of indicator functions 
(Iz , Be B), where the indicator function J, of B is defined by J,(q) = 1 or 0 
according as q ¢ B or q z B. Now for any B, and B, in B, 


(4.8) (Is, , Ip.) u : u( B,B) = E|Z(B,)Z(B:z)). 


Therefore, by the Basic Congruence Theorem, there is a congruence y from 


L.(Q, B, uw) onto Lo(Z(B), Be B) such that for any Be B, 
(4.9) W(Iz) = Z(B). 


This fact justifies the following definition of the stochastic integral. 

DEFINITION 4D. Let (Q, B, uw) be a measure space and let |Z(B), B e B} be an 
orthogonal random set function with covariance kernel yu. For any function f in 
L.(Q, B, w) one defines the stochastic integral of f with respect to {Z(B), Be B}, 
denoted fof dZ, by 


(4.10) | faz =H), 
“@ 

where y is the congruence from L2(Q, B, u) onto Le(Z(B), B e B) determined by 
(4.9). 

We are now in a position to state our version of Karhunen’s theorem (see 
[13], p. 29). 

THEOREM 4A. Let |X(t), te T} be a time series with covariance kernel K. Let 
{f(t),te T} be a family of functions in a space Lo(Q, B, uw), such that for all s, t inT 


(4.11) K(s,t) = [ fog du. 
Q 


Then \f(t), te T} is a representation for | X(t), te T}. 
If, further, \f(t), te T} spans L.(Q, B, uw), then there is an orthogonal random 
set function {Z(B), Be B} with covariance kernel uw such that 


e 


(4.12) X(t) = | f(t) az, 


7 


and every random variable U in Lo( X(t), te T) may be represented 


(4.13) | = / g dZ 
“—- 
for some unique function g in Lo(Q, B, uw). 

Proor. Let y be the congruence from L.(f(t),t¢ T) onto Le( X(t), te T) satis- 
fying (4.4). If {f(t), te T} spans L.(Q, B, uw), define, for Be B, Z(B) = (Iz) 
It is immediate that {Z(B), Be B} is an orthogonal random set function with 
covariance kernel uw. By the definition of the stochastic integral, (4.12) is merely 
another way of writing the fact that X(t) = y(f(t)). 

Theorem 4A, together with (2.2) and (2.1), yields the following fundamental 
result. 
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SPECTRAL REPRESENTATION THEOREM FOR STATIONARY TIME SERIES. A 
discrete parameter time series |X(t), t = 0, +1, ---} ts weakly stationary if and 
only if for some Lebesgue-Stieltjes measures yu on the interval Q = {X: —-r SAS} 
the complex exponentials \e'™', t = 0 + 1, ---} form a representation for the time 
series in Lo(Q, B, uw) where B is the o-field of Borel subsets of Q. Then there exists 
an orthogonal random set function |Z(B), B ¢ B} such that 


(4.14) X(t) = / e*Z(dy),¢ = 0,41,---. 


A similar theorem holds for continuous parameter time series with 
Q={A:-02 <A < ~}. 


The representation of a time series as an integral with respect to an orthogonal 
random set function is not a natural representation, since one may choose such 
representations of a time series in a multitude of ways. Indeed, if (Q, B, uw) is a 
measure space such that L.( X(t), te T) and Q.(Q, B, uw) have the same dimen- 
sion, there are many families {f(t), ¢ ¢ T} of functions in L.(Q, B, uw) which are a 
representation for |X(t), te T}. What one desires is a family {f(t), te T} of 
familiar functions [such as the family of complex exponentials e‘, which are a 
representation in a suitable space L.(Q, B, uw) for a stationary time series]. I be- 
lieve there is a natural representation in terms of which to solve problems of sta- 
tistical inference on time series, namely the representation of a time series with 
covariance kernel K by the functions {K(-, t), te T} in the reproducing kerncl 
Hilbert space H(K). 

Derinition 4E. A Hilbert space H is said to be a reproducing kernel Hilbert 
space, with reproducing kernel K, if the members of H are functions on some set 
T, and if there is a kernel K on T @ T having the following two properties; for 
every t in T (where K(-, t) is the function defined on 7, with value at s in T 
equal to K(s, t)): 


(4.15) K(-,t)e A 
(4.16) (g, K(-, t))a = g(t) 


for every g in H. 

Intuitively, a reproducing kernel Hilbert space is a Hilbert space which con- 
tains a function playing the role of the Dirac delta function 6(t). It should be 
recalled that, for square integrable functions f(-), 


f(s)6(s — t) ds = f(t). 


— 0 


Consequently, the kernel K(s, t) = 6(s — t) satisfies (4.16). However it does 
not satisfy (4.15), and therefore is not truly a reproducing kernel. 

TuroremM 4B (Moore-Aronsjazn-Loéve [1], [27]). The covariance kernel K of a 
time series generates a unique Hilbert space, which we denote by H(K), of which 
K is the reproducing kernel. 
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Since K(s,t) = (K(-,8), K(-,t)) way) = E[X(s)X(t)]| we immediately obtain 
the following important theorem. 

TuHeoreM 4C. Let | X(t), te T} be a time series with covariance kernel K. Then 
the family {K(-,t),te T} of functions in H( K ) is a representation for |X(t),te T}. 
Given a function g in H(K), we denote by (X, g)x or (g, X)x the random variable 
U in L,(X(t), te T) which corresponds to g under the congruence which maps 
K(-, t) into X(t). We then have the following formal relations: for every tin T, and 
g, hin H(K), 

(X, K(-,t))x = X(t) 
(4.17) 
E\|(X, h)x(X, g)x| = (hy g)x 
where we hereafter write (h, g)x for (h, g) aux . 

The next theorem shows the relationship between the reproducing kernel 
Hilbert space representation of a time series, and the representation of a time 
series by an orthogonal decomposition of the form of (4.12). 

TurorEM 4D. Let K be a covariance kernel. If there exist a measure 
space (Q, B, »), ana a family of functions \f(t), te T} in L.(Q, B, uw) such that 
(4.11) holds, then the reproducing kernel Hilbert space H(K) corresponding to the 
covariance kernel K may be described as follows: H(K) consists of all functions g, 
defined on T, which may be represented in the form 


(4.18) g(t) = / g*f(t) du 
Q 


for some (necessarily unique) function g* in the Hilbert subspace Lo(f(t), te T) of 
L2(Q, B, wu) spanned by the family of functions {f(t), te T}, with norm given by 


(4.19) 


If {f(t), te T} spans L.(Q, B, uw), so that X(t) has an orthogonal decomposition 
(4.12), then we may write 


(4.20) (X,¢)c = / g* dZ. 
Q 


Proor. Verify that the set H of functions of the form of (4.18), with norm 
given by (4.19), is a Hilbert space satisfying (4.15) and (4.16). 

THEOREM 4E. (General solution of the prediction problem.) Let | X(t), te T}, 
be a time series with covariance kernel K(s, t), and let H(K) be the corresponding 
reproducing kernel Hilbert space. Between Lo( X(t), te T) and H(K) there exists a 
one-one inner product preserving linear mapping under which X(t) and K(-, t) are 
mapped into one another. Denote by (h, X)x the random variable in L2( X(t), te T) 
which corresponds under the mapping to the function h(-) in H(K). Then the gen- 
eral solution to the prediction problem may be written as follows. If Z is a random 
variable with finite second moment, and if 


(4.21) pz(t) = E[ZX(t)], 
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then 
(4.22) E*(Z | X(t), te T)] = (pz, X)x 
with mean square error of prediction given by 


(423) El|Z— E*(Z|X(t),teT)|*] = £|Z|? — (pz, pz)x- 


| 


Theorem 4E represents a coordinate free solution of the prediction problem. 
The usual methods of explicitly writing optimum predictors, using either eigen- 
function expansions, Green’s functions (impulse response function), or (power) 
spectral density functions, are merely methods of writing down the reproducing 
kernel inner product corresponding to the covariance kernel K(s, t) of the ob- 
served time series. 

The validity of Theorem 4E follows immediately from the definition of the 
concepts involved. However, it may be instructive to give a proof of the theorem, 
using the following properties of the mapping (h, X)x . For any functions g and 
hin H(K) and random variables Z with finite second moment it holds that 


(4.24) E{(h, X)x(g, X)x] = (h, g)x 
(4.25) E(Z(h, X)x|] = (pz, h)x, 


in which pz(t) = E[ZX(t)|. Now a random variable in L.( X(t), te T) may be 
written (h, X)x for some h in H(K). Consequently the mean square error be- 
tween any linear functional (h, X)x and Z may be written 


E[{ | (h, X)x — Z\*| = El(h, X)x] + ElZ’| — 2E[Z(h, X)x] 
(4.26) E[Z’| + (h, h)x — 2(pz, h)x 
= E[Z’| = (pz, pz)x + (h - pz,h — pz)a - 


From (4.26) it is immediate that (pz, X)x is the minimum mean square error 
° ° r : ° . 7 2 
linear predictor of Z, with mean square prediction error equal to E[Z*] — 
(pz, pz)x - The proof of Theorem 4E is complete. 


5. Examples of reproducing kernel Hilbert space representations. In this 
section we give the reproducing kernel Hilbert space representation of a time 
series { X(t), te T} under a variety of standard assumptions. 

EXAMPLE 5A. Suppose 7’ = {1, 2, --- , N} for some positive integer N, and 
that the covariance kernel K is given by a symmetric positive definite matrix 
{K,,} with inverse {K'’}. The corresponding reproducing kernel space H(K) con- 
sists of all N-dimensional vectors f = (f,, --- , fw) with inner product 


N 


(5.1) (f,gje= 2 IK"g. 


8,t=1 


To prove (5.1) one need only verify that the reproducing property holds: for 
u=lil,---,N, 


N N 
(f,Kude = > f,.K'Ku = > f.i(8, u) = fu. 


s,t=1 s=l1 
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The inner product may also be written as a ratio of determinants: 
Ku PAre Kw fy r = 
° ° oa K ll oon Kw 


(f ; ts z oe a : 
Js 9)x Kw Kww In 


91 Yn 0 


“- a Kywy 


To prove (5.2) one again need only verify the reproducing property. In the case 
in which the covariance matrix K is singular, one may define the corresponding 
reproducing kernel inner product in terms of the pseudo-inverse of the matrix K 
(see Greville [15] for a discussion of the notion of pseudo-inverse ). 

Although (5.1) provides a formula for the reproducing kernel inner product 
in terms of the inverse of the covariance matrix, it is to be emphasized that one 
need not necessarily invert the covariance matrix in order to find the reproducing 
kernel inner product. The point of introducing the reproducing kernel inner 
product is that the inversion of the covariance matrix is usually an intractable 
problem, and one should look instead to evaluate that for which one would use 
the inverse K-'; namely, the evaluation of inner products in the reproducing 
kernel space. Various iterative methods of evaluating these inner products can 
be given (see [39] or [40]). This observation is undoubtedly not as important in 
the case of discrete parameter time series as it is in the case of multiple time 
series and continuous parameter time series. 

EXAMPLE 5B. Autoregressive schemes (discrete parameter). A discrete parameter 
weakly stationary time series X(t) is said to satisfy an autoregressive scheme of 
order m if X(t) is the solution of the stochastic difference equation 


L.X(t) = > aX (t —k) = n(t) 


k=0 


where a , --*, @» are given constants, and {n(¢)} is an orthonormal sequence of 
random variables. We now show that given observations { X(t), ¢ = 1,2, --- ,N} 
the reproducing kernel Hilbert space H(K) corresponding to the covariance 
kernel K of the observations consists of all N-vectors f = ((f(1),--- , f(N)) 
with inner product given by 

N m 


(5.4) Gfge= Dd. {Lf(t)} {La(t)}} + YX daf(j)g(k) 
1 


t=m+1 ji k= 
where the matrix D = {dj} has an inverse D™' = {d™} with general term 
(5.5) d™® = K(j —k) = E[X(j)X(k)]. 
In the case that N = 2m, an explicit expression for dj, is given by 
min(j,k) 


das 2, (itis — @ep0-fered}- 


u=| 
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In particular for a first order autoregressive scheme and N = 2 
(f,9)x = (ao — ai)f(1)g(1) 
(5.7) N 
- 2 faof(t) + ayf(t — 1)}{ag(t) + ag(t — 1)}. 
For a second order autoregressive scheme and N = 4 
(f,g)x = (ao — az){f(1)g(1) + f(2)g(2)} 
+ (aoa, — ma2){f(1)g(2) + f(2)g(1)} 


+ Dd faof(t) + af(t — 1) + af(t — 2)} 


t=3 
-{ag(t) + ag(t — 1) + ag(t — 2)}. 


One can give a purely algebraic proof of (5.4). However a simpler proof can 
be given if one uses certain facts from probability theory. Let us suppose that 
X(1),---, X(N) are jointly normally distributed random variables with co- 
variance matrix Ky = {K,,,} with inverse matrix Ky’ = {K*}. Then the joint 
probability density function of X(1), --- , X(N) may be written 


(5.9) fra.--.xey(t1, *** tn) = {(2e)* | Ky |} * exp { —43(z, 2) xy} 


where |Ky| is the determinant of Ky , and the inner product (z, z)x, is defined 
by the right hand side of (5.1). On the other hand, if X(1), --- , X(N) satisfy 
the difference equation L,X(t) = n(t), where n(1), --- , »(N) are independent 
normal random variables with means 0 and variance 1, then 


fx, «+, X(m) (m+) ,-+>,9(N) (La 9 °° * Lm, Ym41,°** » Yn) 
(5.10) 7 anil a 
= {(24)* |Knl}* exp (H(z, t)eq + 20 vill 
j=m+ 
Transforming from 
(X(1),--- , X(m), n(m + 1), --- , n(N)) to (X(1), --- , X(N)) 


by the linear transformation L,X(t) = n(t),t = m+ 1,--- , N, it follows from 
(5.10) that 


fra).. -X(N)\Ti, °°"; 


(5.11) 


(z,2)z, + ie zit} | 


j=m+ 


Comparing (5.9) and (5.11) it follows that for any N-vector x 


N 
(5.12) (2, 2) ey = (2, 2)x,, + > \Dete| “ 


j=m+) 


which is equivalent to (5.4). 
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To prove (5.6), define the function e;(t) by e;(t) = 1 or 0 according as t = 7 
or t ~ j. Since the time series X(t) is stationary, 


(5.13) (€;,@x)m = (@n—j41, Cn-—ke4qi) x - 


For 1 < j,k S m, defining a; = 0 forj < Oorj > m, 


N 
(@n—j41 » Cn-kqi)x = z Li( en—j41) Lil en—e 4s) 


t=m+1 


N 


a 0 j+t—n—1044t—N-1 


t=m+1 


N-—m 
_ Z. A j—uOk—u 


u=] 


while 


N 
=dxt >> Liles)Lilex) 


t=m+1 


N 
(5.15) =dat+ >, a4 


t=m+1 


N—m 
= dix + — Qu+m—jOu+m—k- 


u=l1 


From (5.13), (5.14), and (5.15), we obtain (5.6). 

From (5.4) and (5.6) one may obtain the inverse matrix of the covariance 
matrix of an autoregressive scheme (see Siddiqui [49] and references cited there ). 

EXAMPLE 5C: Autoregressive schemes (continuous parameter). We next consider 
the reproducing kernel Hilbert space corresponding to the covariance kernel of 
an autoregressive scheme X(t) observed over a finite interval a < ¢ < b. 

A continuous parameter stationary time series X(t) is said to be an autore- 
gressive scheme of order m if its covariance function R(u) = E[X(t)X(t + u)] 
may be written (see Doob [D1], p. 542) 


o et(s-te 
(5.16) R(s — t) =| ——_— » dw 
— 2 Qn > a,(iw)” 
k 


where the polynomial > >7.» a,z”~* has no zeros in the right half of the complex 
z-plane. It may be shown that given observations of such a time series over a finite 
interval a S t S b, the corresponding reproducing kernel Hilbert space contains 
all functions h(t) on a S t S b which are continuously differentiable of order m. 
The reproducing kernel inner product is given by 


m—1 


b 
(5.17) (h,g)x = [ (L,h) (Lig) dt + > d;,h? (a)g (a) 


“a j k=O 
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where 


m 


(5.18) Lh = > ah" (t) 
k=0 


aj3t+k 7 
(5.19) ees bté — '&) ( 
latiduk 


t=a,u = j 


The first and second autoregressive schemes are of particular importance. 

A stationary time series X(t) is said to satisfy a first order autoregressive 
scheme if it is the solution of a first order linear differential equation whose input 
is white noise ’(t) (the symbolic derivative of a process 7(t) with independent 
stationary increments) : 


(5.20) (dX /dt) + BX = 7/(t). 


It should be remarked that from a mathematical point of view (5.20) should be 
written 


(5.21) dX(t) + BX(t)dt = dn(t). 


Even then, by saying that X(t) satisfies (5.20) or (5.21) we mean that 


(5.22) X(t) = [ H(t — 8) dn(s) 


where H(t — s) = e *‘*” is the one-sided Green’s function of the differential 
operator L.f = f’(t) + Bf(t). 

The covariance function of the stationary time series X(t) is 
(5.23) R(t — u) = (1/28)e°'™"'. 


The corresponding reproducing kernel Hilbert space H(K) contains all differen- 
tiable functions. The inner product is given by 


b 
(5.24) (h, g)e = / (h’ + Bh)(g’ + Bg) dt + 2Bh(a)g(a). 


a 


More generally, corresponding to the covariance function 


~ . —B\s—t| 
(5.25) K(s, t) = Ce?" 
the reproducing kernel inner product is 


b 
(h,g)x = vat | (h’ + Bh)(g’ + Bg) dt + 28h(a)g(a)t 


Sie ; 
= an | (h'g’ + Bhg) dt +; 


(5.26) 
ms {h(a)g(a) + h(b)g(b)}. 


The random variable (h, X)x in Le(X(t), a S t S b) corresponding to h(-) in 
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H(K) may be written 
t 


ta ae [ i 
= 560 8 | h(t)X(t) dt + | h’(t) dX(t) 


f 


6 


+ on {h(a)X(a) + h(b)X(b)}. 


Note that X’(t) does not exist in any rigorous sense; consequently we write dX (t) 
where X’(t¢) dt seems to be called for. It can be shown that (5.27) makes sense. 
In the case that h(- ) is twice differentiable, one may integrate by parts and write 


b 


[ X (t)h”(t) dt. 


va 


(5.28) [ h'(t) dX(t) = h’(b)X(b) — h’'(a)X(a) — 


A stationary time series X(t) is said to satisfy a second order autoregressive 
scheme if it is the solution of a second order linear differential equation whose 
input is white noise 7’(t): 


(5.29) (dX /dt?) + 2a(dX/dt) + yX = v(t). 


Ifw = y —a > 0, the covariance function of the time series is 
i ; " Sei ' 4 ' 
(5.30) R(t — u) = — (cos w(u — t) + —sinwlu — tle. 
tay’ ( Ww f 
The corresponding reproducing kernel Hilbert space contains all twice differen- 
tiable functions on the interval a < ¢ < b with inner product 


b 
(531) (h,g)x = / (h” + 2ah’ + yh)(g” + 2ag’ + yg) dt 
00 


va 


+ 4day'h(a)g(a) + 4ah’(a)g’(a). 


To write an expression for (h, X)x , one uses the same considerations as in (5.27). 
Other examples of reproducing kernel Hilbert spaces are given in [39] and [40]. 


6. Regression analysis of time series with known covariance function. The 
theory of regression analysis (and of the general linear hypothesis) plays a 
central role in statistical theory. In this section we show how to solve certain 
standard problems of regression analysis in cases in which the observations 
possess properties of dependence or continuity. For a discussion of the history 
and literature of regression analysis the reader is referred to Wold [58]. 

The classical problem of regression analysis may be posed as follows. Given (i) 
observations X(t), ¢ = 1, --- , N, with known covariance kernel 


(6.1) K(s, t) = Cov [X(s), X(t)] 
and mean value function m(t) = E[X(t)| of the form 
(6.2) m(t) = Byw,(t) +--+ + Byw,(t) 


where w,(-), --* , w,(-) are known functions, and 6; , --- , 8, are unknown real 
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numbers, and (ii) a linear function 


(6.3) V(B) = Wii + --- + WeBq 


of the parameters, where ¥ , --- , ¥, are known constants. Estimate ¥(-) by an 
estimate which (i) is linear in the observations in the sense that it is of the form 
Lat c,X(t) for some real numbers c, , --- , Cy , (ii) is an unbiased estimate of 
y(-) in the sense, that for all 8 = (8,,--- , Bg), 


(6.4) Ex P C; x(o | = = c.m(t) = > B; = c,w;(t) = (8), 


t=1 tan} j=1 t=1 


and (iii) has variance 


N N 
(6.5) Var > C; x(0 | = 2 c, K(s, the, 


t=1 8,t=1 
equal to the minimum variance of any unbiased linear estimate. 
The problem of finding the minimum variance unbiased linear estimate of a 
linear parametric function ¥(8) can be posed as a problem involving the minimi- 
zation of a quadratic form subject to linear restraints. Define K = {K(s, t)}, 


wi(1) +--+ w,(1) Wi 
(6.6) W =|: ole set 
w,(N) +--+ w,(N) Ve 


> 


and let c’ denote the transpose of a (column) vector c. The unbiasedness condi- 
tion (6.4) can be stated in matrix form as 


(6.7) cW=Yy'. 


The problem of finding the minimum variance unbiased linear estimate can now 
be posed as follows: find the vector c which minimizes the quadratic form c’Ke, 
subject to the constraints c’-\W = y’ (compare Bush and Olkin [7]}). 

THeEeoreM 6A. Let K be a positive definite n K n symmetric matrix, W be an 
n X q matrix, and y a q-vector. Assume that 


(6.8) V = WK 'W 


is non-singular. The n-vector c* which minimizes the quadratic form c’Ke among all 
n-vectors c satisfying W'c = wp is given by 


(6.9) c* = K'WV'y 
and the minimum value of the quadratic form is given by 
(6.10) c*’Kce* = y'V'Y. 


Proor. One easily verifies that the vector c* defined by (6.9) satisfies the re- 
straint W’c = y, and that (6.10) holds. To complete the proof we show that for 
any n-vector c such that c’'W = y’ it holds that 


(6.11) c’/Ke = WV‘ y. 
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Now for any qg-vector z, letting y = Wz, 


(6.12) ’‘Ke = - = ; 
6.1 cKhc2 JK-ly TVs 


Vz 
Taking the supremum of the right side of (6.12) over all g-vectors z, one obtains 
(6.11), since 


(c'y)” _— (c’Wz)” _ (pz)? 
- = 


(6.13) sup, [(¥’z)"/z’Vz] = WV“y. 


From Theorem 6A, one immediately obtains Theorem 6B. 
THEOREM 6B. The minimum variance linear unbiased estimate of a parametric 
function p(B) ts 


(6.14) yt = c¥X = y/V (WK 'X). 
The variance of y* is given by 
(6.15) Var [y*] = c*’Kce* = VY. 


. * * . . . . . 
In particular, the vector B*’ = (B;,--- , Bq) of minimum variance unbiased linear 
estimates of B,,-°-- , Bg may be written 


(6.16) p* = V'(W’K 'X) 
with covariance matrix 
(6.17) {Cov [67, B5]} = V*. 


The foregoing treatment of the problem of regression analysis with known co- 
variance function depended very much on the assumptions that there were only 
a finite number of observations, and that the matrices K and V were non-singular. 
We now show how to relax these assumptions by using the reproducing kernel 
Hilbert space representation of a time series. The results we now state include 
as special cases the results which were first obtained by Grenander ([12], [14]). 

Let {X(t), te T} be a time series whose covariance kernel K(s, t) = 
Cov [X(s), X(t)] is known and whose mean value function m(t) = E[X(t)] is 
only assumed to belong to a known class M. Let H(K) be the reproducing kernel 
Hilbert space corresponding to K. Assume that M is a subset of H(K). It may 
be shown that between L.( X(t), te 7) and H(K) there exists a one-one linear 
mapping with the following properties: if (h, X )x denotes the random variable in 
[,( X(t), te 7) which corresponds under the mapping to the function h in H(K), 
then for every tin 7, and h and g in H(K), 


(6.18) (K(-,t), X)x = X(t), 
(6.19) E,((h, X) x] (h, m)x, for all min M, 
(6.20) Cov [(h, X)x,; (9g, X ) x] = (h, g)K- 


The subscript m on an expectation operator is written to indicate that the expec- 
tation is computed under the assumption that m(-) is the true mean value 
functior.. 
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If T is finite, and K is non-singular, then (h, X)x = h’K~'X. For other exam- 
ples of (h, X)x, see Section 5. 

We are interested in estimating various functionals ¥(m) of the true mean 
value function m(-) by estimates which (i) are linear in the observations 
{X(t), te T} in the sense that they belong to L.(X(t), te 7), (ii) are unbiased 
and (iii) have minimum variance among all linear unbiased estimates. A func- 
tional ¥(m) is said to be linearly estimable if it possesses an unbiased linear esti- 
mate (g, X )x . Since 


(6.21) E,»\(g, X)x] = (g, m)x = ¥(m), for all m in M 
it follows that ¥(m) is linearly estimable if and only if there exists a function g 
in H(K) satisfying (6.21). Now the variance of a linear estimate is given by 
(6.22) Var [(g, Xx] = (9, 9)x- 


Consequently finding the minimum variance unbiased linear estimate ¥* = 
(g*, X)x of ¥(m) is equivalent to finding that function g* in H(K) which has 
minimum norm among all functions g satisfying the restraint (6.21). To find the 
vector g* with minimum norm it suffices to find any vector g satisfying (6.21). 
Then the projection 


(6.23) g* = E*|g | M), 


of g onto the smallest Hilbert subspace M containing M, satisfies (6.21) and has 
mnimum norm among all vectors satisfying (6.21). 

THEOREM 6C. The uniformly minimum variance unbiased linear estimate y* of a 
linearly estimable function y(m) is given by 


(6.24) y* = (E*[g| M), X)x 
with variance 
(6.25) Var [y*] = ||E*(g | MI|z, 


where g is any function satisfying (6.21), M is the smallest Hilbert subspace of H(K) 
containing M, and E*|g | M] denotes the projection onto M of g. In particular, the 
uniformly minimum variance unbiased linear estimate m*(t) of the value m(t) at a 
particular point t of the mean value function m(-) is given by 


(6.26) m*(t) = (E*[K(-, t) | M], X)x 
since 
m(t) = (K(-,t), m)x. 


In the special case that M consists of all functions m(t) of the form of (6.2), 
and the matrix 


(wi, Wide ++ (Wi, We)K 


(6.28) 


(wo, Wide +++ (We, We)k 
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is non-singular, then 


(wi, X)x_ 
(6.29) g* = V" : 
(We, xX )x 


One may write an explicit formula for the minimum variance unbiased linear 
estimate y* of a linear parametric function ¥(8) = W681 + --- + ¥,68, as follows, 
where V;; = (wi, W;)x3 
Vis -"e Vi. (Zz. Wi) Kx . » 
Pe cog! a ‘ Vir +++ Vig 
(6.30) y* = -|_: . ee + |: : 
Var-** Vaq (X, We)x 


[Vi oss We 0 


It should be noted that the proof of Theorem 6C is exactly the same in spirit 
as the proof of the Gauss-Markov theorem given in Scheffé ([46], p. 14). The 
point of Theorem 6C is that it enables one to develop a theory of regression analy- 
sis and analysis of variance for cases in which one has an infinite number of ob- 
servations. In particular, we state the analogues of certain basic results on simul- 
taneous confidence intervals (Scheffé [46], p. 68) and hypothesis testing (Scheffé 
[46], p. 31). 

Hypothesis testing and simultaneous confidence bands for mean value functions. 
If the time series X(t) is assumed to be normal, or if all linear functionals (h, X )x 
may be assumed to be approximately normally distributed, then one may state 
a confidence band for the entire mean value function m(-) as follows. Given a 
confidence level a, let C,(a) denote the a percentile of the x’ distribution with ¢ 
degrees of freedom; in symbols, P[x; = C,(a)] = a. 

We now show that if the smallest space M containing all mean value functions 
has finite dimension q, then 


Ver ae 


qq 


, m*(t) — [C,(a) }*o[m*(t)] < m(t) S m*(t) + [(C,(a)]'o[m*(t)] 
6.31) 


foralltin-—«» <t< a 


is a simultaneous confidence band for all values of the mean value function with 
a level of significance not less that a; that is, if m(-) is the true mean value func- 
tion then (6.31) holds with a probability greater than or equal to a. 

To prove (6.31) we prove more generally the following theorem. 

TuHeEoreEM 6C. (Simultaneous confidence interval of significance level a for all 
estimable functions (m, g).) If M has dimension q then for all m in M 


> 


\(X, E*[g| M))x — (m,g)el’ & ¢ 


(6.32) ol sup " ry 
gcH(K) Var [(X, E* |g | M))xl 


Proor. Let w,, +--+ , w, be orthonormal functions which span M. Then we 
may write m = Bw; + --- + B,w, where 8; = (m, w;) is a function of m. Fur- 
ther, (m, g)x = ab: + --- + aB,, (X, E*lg| M))x = Bt + --- + 83, 
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Var [((X, E*(g | M)})x] = >> 3.1 a? , where a; = (w;, g)x and 6} = (X, w,)x . Next 
the random variable appearing in (6.32) is equal to 


q 2 
2 a;(B} — B;) q 
(6.33) sup ————_——— = 2 ( (8; — B;)° 


i CT eo 


which is distributed as x; (compare Scheffé [46], p. 416). 

Similarly one may prove the following theorem. 

THEOREM 6D. Given a q-dimensional subspace M of H(K), and a q’-dimensional 
subspace M’' of M, to test the composite null hypothesis Ho : m(-) ¢ M’, against the 
composite alternative hypothesis, H, : m(-) ¢ M, one may use the statistic 
(6.34) A = |\mi(t) — my-(t)|| & 
where my(t) [my'(t)] denotes the minimum variance unbiased linear estimate of 
m(t) under the hypothesis Hi{Ho|. Under Hy, & is distributed as x’ with q — q’ 
degrees of freedom. 

In the special case that M consists of all functions m(t) of the form (6.2), and 
M’ consists of all functions in M for which 8; = 0 forj = q’ + 1, --- , q, then 
the statistic A may be written 


(6.35) 


where, defining V;; = (w;, wj)x, 


_ \(w; — E*{w;| wi, --- 


» — Bey. | 
W; 4 *|w; | W1, 


Vin on Vij ’ ree Vij Vu sii Vij 


Vi eee V55-1 (w,; 9 xX) Va oe V5; Vyas $8 V5-1,5-1 


The reader may find it illuminating to write out (6.36) in the case that g = 2 
and q’ = 1. 

Regression analysis when the covariance function of the observations is only known 
up to a constant factor. Suppose that the covariance function of the time series 
{X(t), te T} is of the form 


Cov [X(s), X(t)] = o°K(s, t) 


where the kernel K(s, t) is known and o’ is an unknown positive constant, and 
that the mean value function m(t) = E|X(t)] is known to belong to a set M 
which is a subspace (of dimension g) of H(K), the reproducing kernel Hilbert 
space corresponding to K. Theorem 6C continues to hold, except that (6.25) 
should be replaced by 


6.257) Var, [¥*] = o* ||E*(g | M)|| x. 
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The variance of the estimate ¥y* depends on the unknown parameter o°. There- 
fore one needs to estimate o in order to know Var,|y*]. To discuss the estimation 
of o’, we need to distinguish between the case in which the index set T is finite 
and the case in which T is infinite. 

If T is finite, the time series | X(t), ¢¢ T}, regarded as a function of t, may be 
shown to belong to H(K). Further if n is the dimension of H(K), then for all 
possible mean value functions m(t) and values of o” 


E{\|X(t) — m(t)|| zk] = no’ 


E{|\\m*(t) — m(t)|| x] = go" 


> 


E{\|X(t) — m*(t)|| xk] = (n — q)e. 
Therefore 


(6.38) o* = (n —q) |\|\X(t) — m*(t) 


is an unbiased estimate of o° (which in the case of normally distributed observa- 
tions is independent of m*(t) ). 

If T is infinite, it is possible to estimate o° exactly by forming a sequence of 
estimates of the form of (6.38) based on a monotone sequence of finite subsets 
{T,,} of T whose limit is dense in T. 


7. The probability density functional of a normal process. The prediction and 
regression problems considered in the foregoing have all involved linear estimates 
chosen according to a criterion expressed in terms of mean square error. Never- 
theless the mathematical tools developed continue to play an important role if 
one desires to employ other criteria of statistical inference. All modern theories 
of statistical inference take as their starting point the idea of the probability 
density function of the observations. Thus in order to apply any principle of 
statistical inference to problems of time series analysis, it is first necessary to 
develop the notion of the probability density function (or functional) of a sto- 
chastic process. In this section we state a result showing how one may write a 
formula for the probability density function of a stochastic process which is 
normal. 


Given a normal time series {| X(t), t¢ T} with known covariance function 
(7.1) K(s, t) = Cov [X(s), X(t)] 


and mean value function m(t) = E[X(t)], let P,, be the probability measure in- 
duced on the space of sample functions of the time series. Next, let m, and mz be 
two functions, and let P; and P2 be the probability measures induced by normal 
time series with the same covariance kernel K, and mean value functions equal 
to m, and m, respectively. By the Lebesgue decomposition theorem it follows 
that there is a set N of P;-measure 0 and a non-negative P,-integrable function, 
denoted by dP,/dP, , such that for every measurable set B of sample functions 


(7.2) P.(B) = / (dP 2, dP;) dP, + P.(BN). 


“B 





TIME SERIES ANALYSIS 979 


If P2(N) = 0, then P; is absolutely continuous with respect to P; , and dP:/dP, 
is called the probability density function of P, with respect to P,; . Two measures 
which are absolutely continuous with respect to one another are called equivalent. 
Two measures P; and P, are said to be orthogonal if there is a set N such that 
P\(N) = 0 and P,(N) = 1. 

It has been proved, independently by various authors under various hypotheses 
(for references, see [40], Section 4), that two normal probability measures are 
either equivalent or orthogonal. From the point of view of obtaining an explicit 
formula for the probability density function, the following formulation of this 
theorem is useful. 

THEOREM 7A (Parzen [37], [40]). Let P,, be the probability measure induced on 
the space of sample functions of a time series {|X (t), t e T} with covariance kernel K 
and mean value function m. Assume that either (i) T is countable or (ii) T is a 
separable metric space, K is continuous, and the stochastic process {X(t), te T} is 
separable. Let Po be the probability measure corresponding to the normal process with 
covariance kernel K and mean value function m(t) = 0. Then P,, and Po are equiva- 
lent or orthogonal, depending on whether m does or does not belong to the reproducing 
kernel Hilbert space H(K). If m belongs to H(K), then the probability density func- 
tional of P», with respect to Po is given by 


(7.3) f(X, m) = dP,,/dPo = exp {(X, m)x — (4)(m, m)x}. 


Using the concrete formula for the probability density functional of a normal 
process provided by (7.3), there is no difficulty in applying the concepts of 


classical statistical methodology to problems of inference on normal time series. 
In particular the following theorem may be proved. 

TuHeoreM 7B. Let { X(t), te T} be a normal time series, satisfying the assump- 
tions of Theorem 7A with known covariance kernel K(s, t) = Cov [X(s), X(t)], 
whose mean value function is only assumed to belong to a known class M. If M is a 
finite dimensional subspace of the reproducing kernel space H(K), then the mazi- 
mum likelihood estimate m*(-), defined as that estimate in the space M of admissible 
mean value functions such that 


(7.4) f(X, m*) = maxn:uf(X, m), 


exists and is given at each t in T by the right hand side of (6.26). 

If M is an infinite dimensional space, then a maximum likelihood estimate does 
not exist. This is not too surprising, since M is not compact in this case. However, 
an estimate does exist which is the uniformly minimum variance unbiased linear 
estimate of the value m(t) at a particular time ¢ of the mean value function; 
this estimate is given by (6.26). 

The theory of reproducing kernel Hilbert spaces turns out to provide a natural 
tool for treating problems of minimum variance unbiased estimation (see Parzen 
[37]). Further work along these lines in the case of normal time series is being 
done by Ylvisaker ([59]). 
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8. Correlation analysis of regression free stationary time series. In this 
section, we state some results for discrete parameter time series (a more compre- 
hensive survey is given by Hannan [17]). Many of the results stated may be 
extended to continuous parameter time series. 

We consider a discrete parameter time series {X(¢), t = 1, 2, --- }, with zero 
means, which is weakly stationary of order 4 in the sense that its covariance 
function 


(8.1) Riv) = E[X(t)X(t + v)] 
and its fourth cumulant function 
Q(v, , ve, v3) = ELX(t)X(t + vo) X(t + v2) X(t + 05)] 
— R(v,)R(v2. — v3) — R(v2.)R(v1 — v3) — R(v3)R(11 — ve) 


(8.2) 


are independent of ?. 
ExampPLe: Linear Processes. A discrete parameter time series X(t) is said to be 
a linear process, if it may be represented 


x 
(8.3) A(t) = p w(t — a)n(a) 
a= 

where p ae lw(a)| < ©, and {n(a), a = 0, +1, ---} isa sequence of inde- 
pendent identically distributed random variables with zero means, finite fourth 
cumulant Ay , and second cumulant 2 . A linear process X(t) is weakly station- 
ary up to order 4, with covariance function, spectral density function, and fourth 
cumulant function satisfying 


oo ea 2 


R(v) =x >> wla)wlatv), f(r) = a > wae * i 


a=—o aT | a=—wo 


~« 


(8.4) Q(v;, v2,03) = rg 38 wl a)wla + v;)wla + v2)wla + v3), 


a=—o 


x 
Ns 
> Qn, u,u+ rm) = aR(r,)R(v2), a= —,. 
U=— 20 (Ae)? 
Correlation analysis is concerned with estimating the covariance function R(v), 
and the normalized covariance (or correlation) function 
(8.5) p(v) = R(v)/R(O) 


of a stationary time series. 
Given observations {| X(t), t = 1,2, --- , N}, one can form the sample covari- 
ance function, for |v} < N — 1, 


(8.6) Ry(v) = — >> X(t)X(t+ |v] ) 


which has mean 


(8.7) E[Ry(v)] 
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As an estimate of R(v), Ry(v) is biased (although asymptotically unbiased). 
Consequently if we are interested in estimating R(v) it may be preferable to take 
as our estimate 


Rx(v) = [N/(N — |e|)]Rv(v). 


Many authors have advocated the use of the unbiased estimate Ry(v) in pref- 
erence to the biased estimate Ry(v). However, it appears to me that Ry(v) is 
preferable to Ryx(v) for two reasons: (i) Ry(v) is a positive definite function of v, 
which is not the case of Ry(v); (ii) the mean square error of Ry(v) as an estimate 
of R(v) is in general less than that of Ry(v). That (i) holds is immediate. That 
(ii) holds is shown in Parzen [42]. It will be seen that for theoretical purposes it 
is certainly more useful to consider Ry(v) rather than Ry(v). 

Using the large of large numbers proved in Parzen ((38], pp. 419-420), one 
may prove the following theorems on consistency of the sample covariance 
function. 

THEOREM 8A. The sample covariance function of a weakly stationary time series 
is consistent in quadratic mean, in the sense that, forv = 0,1,---, 


(8.8) limy.. E | Ry(v) — R(v) |? =0 


if the time series is weakly stationary of order 4, and satisfies (for v = 0,1, --- 


N—1 
(8.9) lim = > Rs) =0 


Now 4 s=0 


N-—v-—1 
(8.10) lim . - Q(v,s,v +s) =0. 


Now i s=0 


THEOREM 8B. The sample covariance function of a weakly stationary time series 
is strongly consistent, in the sense that, for each v = 0,1,--- , 


(8.11) Pilimy.« Ry(v) = R(v)] = ] 
if the time series is weakly stationary of order 4 and satisfies for positive constants 


C and q 


N-—1 
> R(s) < CN for all N 


(8.12) x 
4 s=(0 


(8.13) 


N—v-1 
I | 
| 
| 
| 


> QA», s,v + 8) 


N s=0 


<= CN * for all N. 


In particular, (8.12) and (8.13) hold if it is assumed that 


x2 


(8.14) 7 IR(v)| < « 


v==—00 


2 


(8.15) Zz \Q(v1 , V2, U3)| << ©. 


01,02,03,——00 


We next obtain expressions for the asymptotic covariance of the sample co- 
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variance function (for proofs of the following theorem see Bartlett [2], [3] or 
Parzen [36]). 

Tuerorem 8C. Let X(t) be a time series weakly stationary of order 4, with abso- 
lutely summable covariance and fourth cumulant functions (that is, (8.14) and 
(8.15) hold). Then the sample covariance function Ry(v) has asymptotic covariance, 
for any non-negative integers v; and v , 


(8.16) limy+«0N Cov [Ry(u), Ry(v2)] = D(r, , v2) 


where we define 


© 


D(x, , ve) = yy {R(u)R(u + ve — v;) 
(8.17) uae 


+ R(u)R(u + ve + v1) + Q(ry, , u, u + v2)}. 


For a linear process with spectral density function f(- ) 


D(,1, v2) = 4x | COS AV; COS AVe f(r )dr 
(8.18) , oe 
+a | ! COS AV; COS AVe f( Ai) f( Ae) dry dre « 


In particular, the variance of Ry(v) ts approximately given by 


ar . = 9 X 
(8.19 ) Var [Ry(v)] = cos kv f*(A) dA + a R’(v) = * a R*(v). 


The mean square error of Ry(v) as an estimate of R(v) ts given by 


(8.20 ) E\Ry(v) — R(v) ? = Var [Ry(v)] + ) R’(v). 


[It was empirically observed by M. G. Kendall that the sample covariance func- 
tion (traditionally called the observed correlogram) fails to damp down to 0 for 
increasing values of v, although the true covariance function R(v) does damp 
down to 0 as v tends to ~. This fact is borne out theoretically by (8.19) and 
(8.20), which show that the coefficient of mean square error E | Ry(v) — R(») | 

R’(v) is of the order of 1/N for all lags v of the sample covariance function. 

One may state in a variety of ways conditions under which the sample covariance 
function Ry(v) is asymptotically normal in the sense that for every choice of lags 
v,, °°: ,v, and real numbers wu, --- , Us , 


Elexp i{u; N*(Ry(v;) — E{Ry(v))]) + --- + ue N*(Re(%,) — ElRv(v,)])}] 

(8.21) ( & ) 
exp | - af > u; D(r;, 2;) us | 

= \i,j=l ) 


as N — o (see Walker [54], Lomnicki and Zaremba [29], Parzen [35]). In par- 
ticular, (8.21) holds if X(t) is a linear process. 
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As an estimate of the correlativii finction p(v) we take the sample correlation 
function 


(8.22) pn(v) = Ry(v) Ry(0). 


We do not discuss here the question of the best definition of the correlation 
function for short series. For a discussion of this problem, and references to the 
literature, see Weinstein [55]. 

By standard large sample statistical theory one readily obtains, from (8.21) 
and (8.16), the following theorem. 

THEOREM 8D. If X(t) is a linear process, then py(v) is asymptotically normal 
with asymptotic covariances satisfying, as N — ~, 


(8.23) NE|p,(v1) — p(v1), pw(v2) — p(v2)| > d(x, , v2) 


where we define 
(8.24) d(v;,02) = 4r | dd f*(X) {eos Avy — p(v;)} feos Ave — p(v2)}, 


i = f(X) 
(8.25 ) J\A) =* ‘ 
R(O) 

Remark. It should be noted that while the variance of the sample covariance 
function Ry(v) of a linear process depends on a, the variance of the sample cor- 
relation function py(v) does not. 

Proor. Using only the first few terms of the Taylor series expansion one ob- 
tains that 
Y 


yi {(y — yo)to — (& — aro) yo} + 0O(| x — a a4 ly — Yo |?) 


(8.26) 


Consequently, if X, , Y, , and Z, are sequences of random variables, and 2» , yo , 
and 2 are constants, such that 


k 


, i , ker 
n’(Xn — 2), n'(Yn— Yo), n'(Zn — 20) 


are jointly asymptotically normal it follows that 


4 (¢ n 2) 4 (F =) 
n —-——|], ni—-—- 
} n Yo, } n Yo 


are jointly asymptotically normal with asymptotic covariance satisfying 


X a Xo Ze 20 7 > 2 
Yo n Cov E - 3 ’ Y. ~ | — Io % E|( Y.- yo) | 
n 40 n ¥0 


+ ys El(X, — 20)(Zn — 20)] — 20 Yo El(¥n — yo)(Zn — 20)] 
— 2 Yo E\( Y,— Yo) (Xx, -— Xo)}. 


Applying these results to the present case it follows that the sample correlations 
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py(v) are asymptotically normal with asymptotic covariances satisfying 
R‘(0)N Cov [pw(v:) — p(v1), pw(v2) — p(v2)] 
(8.28) = R(v,)R(v2)D(0,0) + R’(0)D(, , v2) — R(O)R(v,)D(0, v2) 
— R(0)R(v2)D(0, v). 
From (8.28) and (8.18), one obtains (8.24). 
We are now in a position to obtain confidence intervals for, or test hypotheses 
about, a correlation coefficient p(v). From Theorem 8D it follows that the sample 


correlation coefficient py(v) may be regarded as being normally distributed with 
mean p(v) and variance equal to d(v)/N where we define 


T 


(8.29) d(v) = ax [ dd f?(A) {eos Av — p(v)}?. 


Now d(v) S 16m f", ddf'(); further, for large values of v, approximately 
d(v) = 2x f®, ddf’(d). One thus sees that in order to obtain bounds for d(v) 
one must have a knowledge of the quantity 


(8.30) d= 2r [ ar = - ; > R*(v). 
Rs R?(0) I?(0) «=o 
In the study of both correlation analysis and spectral analysis of stationary 
time series it will be found that the quantity d arises frequently as information 
which one requires about the time series under consideration in order to carry 
out various statistical procedures. A satisfactory estimate of d from observations 
{X(t),¢ = 1,2, --- , N} is provided by 


1 N—1 
(8.31) oem Ry(v). 

2R3,(0) »=-tw-1) 

If one does not desire to compute the sample covariance function for all 
v = 0,1, --- , N then one may take, for any 6inO0 < @< ! 


+9 


1 [en] 


8.32) ly (ox) = — &)1 
( ) Gy ,(6N) (1 + 26 — &)R2(0) ie 


Ry(v) 

as an estimate of d. The properties of the estimates dy and dy,jey; have been 
extensively investigated by Lomnicki and Zaremba [29]; among other things 
they show that dy is a consistent estimate of d which in the case of a linear proc- 
ess has an asymptotic variance not dependent on the residuals { n(a)}. 

An alternate approach to the problem of investigating the mechanism generat- 
ing a time series is to attempt to fit the time series by a finite parameter scheme 
(such as an autoregressive scheme or a moving average scheme). Here we con- 
sider only the problem of fitting an autoregressive scheme which has the most 
developed theory (for recent work on fitting moving average schemes, see 
Durbin [10}). 


THEeoreEM 8E. Jn order that a stationary time series X(t) with covariance func- 
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tion R(v), satisfy the autoregressive scheme of order m, 
(8.33) X(t) = aX(t —1) + --- + a,X(t — m) + n(t) 


where n(t) are a sequence of orthogonal random variables (with common variance 
2) ° ° ° ° 
o ) representing the innovation at time t so that 


(8.34) E|X(s)n(t)] = 0 fors<t 


it is necessary and sufficient that the covariance function R(v) satisfy the difference 
equation 


(8.35) R(u) = aR(u — 1) + --- + a,R(u — m) foru>O 


while foru = 0 
(8.36) R(O) = a,R(1) + --- + anR(m) + o’. 


Remark. Equations (8.35) are called the Yule-Walker equations, after G. 
Udny Yule and Sir Gilbert Walker who first obtained relations of this kind (see 
Wold [58], especially pp. 104-5 and pp. 140-146). 

Proor. Verify that (8.33) and (8.35) are each equivalent to the assertion 
that the minimum means square error linear predictor of X(t), given its infinite 
past, depends only on the finite past X(t — 1), --- , X(t — m); in symbols, 
for all t 


E*{|X(t) | X(t — 1), --- , X(t — m), -->] 
(8.36) mn ; 
= aX(t —- 1) +--+ + an,X(t — m). 

We may use the fact that the covariance of a stationary autoregressive scheme 
satisfies the difference equation (8.35) to obtain expressions for the constants 


a, , *** , dm in terms of correlations; (8.35) with u = 1, --- , m yields m equa- 
tions which may be written in matrix form 


p(1) -++ p(m—1)} fay | r e(1) 5 
p(0) coe p(m — 2) | |} Ag ria p(2) 
: : r Aer 
_p(m—1) pl(m—2) -:-: p(0) J La, _ p(m) 3 
Consistent asymptotically normal estimates a{”’, --- , aS”, of a, ---, am 
respectively, may be obtained from observations {X(t), ¢ = 1,2, --- , N} by 
forming consistent asymptotically normal estimates py(v) of p(v) and defining 
a\” to be the solutions of 
(N) 5 


pw(0) pw(1) “++ pr(m—1))] fa , pw(1) } 


py(1) pw(0) *++  pw(m — 2) | mr. | lls)  . 


L_pw(m—1) py(m—2) -::- pw(O) 5s Law” L pw(m) J 


It may be shown, using standard techniques of large sample statistical theory, 
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that if the estmates Ry(v) satisfy (8.21), then the estimates a>”? satisfy 


’ . ri (N rhy (NN 
Elexp i{us(N*) (ar — a)) + +++ + Um N*(an’ — an) }] 


(8.39 ) iz 
—~pi ~ = 7 ui oC;; | 


= i=l 


where {(C;,;} is the inverse matrix of the m by m matrix whose (7, 7)th entry is 
; . ¥(N) . ’ Q _s 
p(i — j). As an estimate {Cj;’} of {C;;{ one may take the inverse matrix of 
° » 2 
‘ov(i — j)}, and as an estimate of « one may take 
N 


. 1 ‘i wi<r , 
(840) ov= v z (X(t) — aX (t — 1) — -.. — an? X(t — m)}’. 


— M t=m+1 


In words, (8.39) says that the usual theorems of regression analysis apply 
asymptotically to the problem of estimating the autoregressive coefficients, 
even though the regression functions X(t — 1), --- , X(t — m) represent lagged 
values of the observed time series X(t). This fact was first shown by Mann and 
Wald [31] whose paper is a fundamental contribution to the theory of time series 


analysis. 
To prove (8.39), we write (8.37) and (8.38) in alternate form as follows. 


”) = 1, Then, for 7 = 1,2,---,m 


>, ajp(i — j) =0 


Define ap = 


> as” p(t — j) = 0. 


j=0 


Therefore forz2 = 1, --:,m 


m m 
. . N . . ° 
(8.41) > pw(i — j)fa} — a; = > a;{p(i — j) — pelt — 3)}. 
j=0 


j=0 
From (8.41) one may deduce (8.39). 
EXAMPLE. Let us write out the foregoing formulas for the case of an autore- 
gressive scheme of order 2. Then (8.38) may be written 
(N) N 
ay, + a> pn( l ) = pn\ 1) 
(8.42) a ae 
a; pn(l) + ae = py(2). 
mr . (N) (N * . 
lhe estimates a;" and a," are explicitly given by 
1 — py(1) 
= pr(2) — pw(1) 
1-— pw(l ) 


Tr ° ° ° ‘ (N) (N) . . 
rhe estimated covariance matrix of {Cov [a;"’, a;"'}} is given by 
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1 py(1) T° 


(8.44) {Cov [a;”’, a3" jet = 55 oN 
N whe 


To test the null hypothesis that the time series obeys an autoregressive scheme 
or order 1 against the alternative hypothesis that it obeys an autoregressive 
scheme of order 2 one uses the statistic 

1_(N))2 TS 9 2 2 
a a2 d ww (2 )- 7( 1 ) 
(8.45) $= if eu\t) 


~ Var [as J 


onv{l — py(1)} 


which under the null hypothesis is distributed as x° with 1 degree of freedom. 
One may similarly give a test of the null hypothesis that the time series obeys 
an autoregressive scheme of order q’ against the alternative hypothesis that it 
obeys an autoregressive scheme of order q (greater than q’). 

For an excellent review of both the small and large sample theory of goodness 
of fit tests for autoregressive schemes, we refer the reader to the monograph by 
E. J. Hannan [17]. For references to recent work on explosive stochastic differ- 
ence equations, see Rao [44a]. 
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SOME MODEL I PROBLEMS OF SELECTION 


By E. L. LEHMANN! 
University of California, Berkeley 
1. Summary. There are given a populations II, , --- , Il, , of which we wish 


to select a subset. The quality of the ith population is characterized by a real- 
valued parameter 6; , and a population is said to be 


(1) positive (or good) if 0; 2 +A, 
(2) negative (or bad) if 6; S 4%, 


where A is a given positive constant and 6) is either a given number or a parameter 
that may be estimated. A number of optimum properties of selection procedures 
are defined (Section 3) and it is shown that for some of these, the optimum 
procedure selects II; when 


(3) 


where 7’; is a suitable statistic, the distribution of which depends only on 6, , 
and where C is a suitable constant. (Sections 4 and 6.) Applications are given 
to distributions with monotone likelihood ratio in the case that 4 is known 
(Sections 5 and 6), and to normal distributions when instead observations on 
% are included in the experiment (Sections 10 and 11). 


2. Introduction. An important class of classification problems is concerned 
with selection, that is with the classification of items into a superior category 
(the selected items) and an inferior one. We shall not be concerned here with more 
general classification procedures which would divide the items into possibly 
more than two categories. Selection problems have been treated in many differ- 
ent formulations. A basic distinction is that corresponding to Models I and II 
in the analysis of variance. In Model I, the items being classified are considered 
fixed; only the observations made on each item are random. In Model II, on 
the other hand, the items themselves are drawn at random from some popula- 
tion and would therefore change under a replication of the experiment. Model 
II problems have been treated recently, among others by Z. W. Birnbaum [1], 
sirnbaum and Chapman [2], T. W. Anderson [3], Cochran [4], Finney [5, 6}, 
Davies [7], Curnow [8] and Dunnett [9]. For the related problem of the rejection 
of outliers, see for example [10]. We shall in the present paper be concerned only 
with Model I. 


We shall assume therefore that a number of varieties, treatments, production 
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methods, etc. (in general we shall speak of populations) are at our disposal. 
Their quality is characterized by a measure which we shall assume to be scalar. 
From the available set we wish to make a selection, selecting as far as possible 
the best ones. 

It is useful to consider two cases according to the size of the group 
to be selected. 

Prose I. A first possibility is that we wish to select only a single population 
(if possible the best one): the variety to be planted, the production method we 
are going to adopt, etc. As a slight generalization we may wish to select a fixed 
number, say two or three. We may have a fixed number of prizes or fellowships 
to award, or we may not wish to put all eggs into one basket. 

ProBLEM 2. In the second case the group size is variable and is determined 
by the observations. This arises for example when we wish to select all worth- 
while treatments or if we want to be reasonably sure that the selected group con- 
tains the best treatment. 

ProsieM 3. There is finally the intermediate possibility that the group size 
is variable but has a fixed upper limit. It may for example be desirable to in- 
vestigate all treatments that appear promising but budget restrictions may 
limit the research program to the investigation of at most three treatments. 

The traditional formal treatment of the class of problems described here, which 
has always been recognized as inadequate, is through tests of homogeneity (as 
for example in the analysis of variance). The only question answered by such a 
test is whether there is any difference at all among the available populations. 

The first step toward a more realistic formulation is due to Mosteller [11] 
who gave a procedure for testing the hypothesis of homogeneity against the 
slippage alternatives that exactly one of the populations has slipped to the right 
and for deciding, in case of rejection of the hypothesis, just which of the popula- 
tions slipped. Mosteller’s paper was at least a partial answer to such an urgent 
need that, in spite of his warnings regarding certain inadequacies in the formula- 
tion, it inspired a large literature on slippage tests. At the same time, it led to 
further clarification of the issues. The first completely satisfactory proposal for 
dealing witha problem of type 2 above was made by Paulson [12], while problem 1 
was formulated and essentially solved? by Bahadur [13]. 

Most of the literature on selection problems so far has been concerned with 
the definition of suitable procedures, an evaluation of their performance char- 
acteristics and the determination of the sample size. An optimum theory was 
developed for problem 1 by Bahadur in [13] and by Bahadur and Goodman in 
[14]. An optimum property of a slippage test, with reference only to slippage 
alternatives, was first proved by Paulson [15]. His proof was applied to other 
problems, was generalized and simplified in papers by Doornbos and Prins 
[16], Kudo [17], Pfanzagl [18], Ramachandran and Khatri [19], Truax [20], and 
Karlin and Truax [21]. Finally, contributions toward optimum properties of 


2 For the nonsequential case which is the only one considered here. 
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procedures for problem 2 were made by Gupta [22], Robbins [23] and Seal [24], 
[25], [26]. In the present paper, we shall be concerned with optimum procedures 
for certain cases of problem 2. 

It is useful to introduce here another distinction according to the definition 
of the quality of a population. 

A. In the simplest case, the quality is defined in absolute terms. If a number of 
new treatments are being compared with a standard treatment, it may happen 
that the latter has been observed so extensively that its effect can be taken as 
known. A treatment is then “good” if it is better (or sufficiently much better) 
than the standard. Another example is furnished by the selection of binomial 
populations, where success probabilities are compared with the “pure chance” 
ralue p = 4, a probability p; being considered as good if it exceeds this value 
or exceeds it by at least a given amount. 

B. Usually, in the comparison of new treatments with a standard, it is of 
course better not to treat the standard as known but to let it participate in the 
experiment as a control. A new treatment is then “good” if it compares favorably 
with the control, the effect of which is also determined by the experiment. 

C. Comparisons are not always relative to a standard or control. If a new 
product is being developed, it may be a question of selecting the most promising 
of a number of variants or a number of production methods. In such a case, each 
population must be compared with the totality of the remaining populations. A 
population may then be considered as “‘good”’ if it is (sufficiently much) better 
than the average of the remaining populations or if it does not fall too much 
below the best one. In the present paper, only problem A and B will be con- 
sidered. 

We mention in conclusion that the applications of selection theory are even 
wider than may appear at first: The emphasis instead of on selection may be on 
elimination. Thus we may wish to eliminate those regression coefficients or 
interactions, which can safely be neglected or those observations that represent 
gross errors. In the latter context slippage procedures, that is, procedures de- 
rived under the assumption of at most one “outlier” were proposed and their 
disadvantages discussed quite early by Pearson and Chandrasekhar [27]. 


3. Formulation of the problem. As in the Neyman-Pearson theory of hy- 
pothesis testing, there are two possible sources of error in any set of selections. 
There is the possibility of false positives, that is, populations which are selected 
although they are negative (<4), and of false negatives, that is, populations 
which are not selected although they are positive (2 + A). Instead of on 
false negatives we shall focus attention on true positives, that is, on those positive 
populations which are included in the selected group. This is analogous to the 
replacement of the consideration of an error of the second kind by that of power 
in the Neyman-Pearson theory. 

Roughly speaking, it is the aim of a selection procedure to seek out the true 
positives while holding false positives to a minimum. 
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For measuring how well a procedure carries out its task of identifying the 
positive populations, a number of criteria are available. 

(a) The expected number of true positives. 

(b) The expected proportion of true positives, that is, the quantity (a) divided 
by the total number of positives. 

These criteria are appropriate if it is desired to include in the selected group 
as many of the positive populations as possible. 

(c) The probability of at least one true positive. 

(d) The probability of including in the selected group the best population 
(that is, the population with the largest 6-value), provided it is positive. 

These two criteria may be appropriate if the selection is only a step in a scheme, 
of which the eventual aim is the selection of a single population.* 

(e) The probability of including all good populations. 

This criterion implies that one would prefer the selection with probability + 
of all good populations and with probability 1 — y of none of them to the selec- 
tion with probability y — « of all and with probability 1 — y + e of all but one 
of the good populations. The criterion would thus seem to be appropriate only 
in rare cases. 

As a measure of the performance of a procedure with respect to false positives 
we shall take either 

(i) the expected number of false positives 
or 

(ii) the expected proportion of false positives, that is, the quantity (i) divided 
by the total number of negatives. 

As a generic notation for any one of the quantities (a)—(e), all of which depend 
on the parameter point @ and on the particular selection procedure 6 under in- 
vestigation, we shall use S(6, 5). Here it is to be understood that S is defined only 
for the set 2’ of those parameter-points for which at least one of the populations 
is positive. 

Similarly, we shall let R(@, 6) denote the quantity (i) or (ii). With these 
definitions of R and S, it is desirable to have S(6, 6) as large and R(@, 6) as small 
as possible. Specifically, we shall consider the problem of determining a pro- 
cedure for which, subject to 


(4) infg.o S(0,6) = vy 


we have 
(5) supee R( 0,5) = min, 
(where 2 denotes the whole parameter space), or the dual problem in which 
inf S(@, 6) is maximized subject to an upper bound on sup R(@, 6). 

Which of the various formulations is most appropriate, depends of course on 
the particular circumstances of each problem. In the absence of such more specific 


3’ A complete sequential procedure for dealing with this problem was proposed by Stein 
[28]. For more recent work on such sequential procedures, see [29]. 
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considerations, it seems perhaps most reasonable to control the minimum value 
of either (b) or (d). Subject to this condition one might, since each false positive 
provides a nuisance disturbance, wish to minimize the maximum value of (i). 

It is of interest to note that condition (4) with S given by (b), that is, 


(6) infs.a [expected proportion of true positives] = y, 
implies 
(7) info P{at least one true positive} = y. 


This follows from the fact that the left-hand side of (7) always exceeds that of 
(6). To see this, denote by A; the event of including the ith population in the 
selected group, let J denote the set of indices 7 for which the ith population is 
positive, and let k be the number of elements of J. Then 


P(U A,) = max P(A,) = >> P(A,)/k 


tel tel tel 
as was to be proved. 
The other proposed condition, (4) with S given by (d), that is, 


(8) P{best population is in selected group} = y for all 6 € 0’, 


can be given the following interpretation. Let = denote the set of selected indices. 
Then 2 constitutes a confidence set for the index i corresponding to the best 
population, provided attention is restricted to the parameter set 2’. 

We shall prove in the next sections, for certain families of distributions, that 
the solution with any of the formulations (a)—(d) combined with either (i) or 
(ii) is given by (3) of section 1 but that this is not true for formulation (e). 


4. A minimax solution. If attention is restricted to nonrandomized proce- 
dures, as can always be done by enlarging the sample space, a selection procedure 
is a partition of the sample space into the sets Dj, ....,;, of those sample points for 
which the selected group consists of the populations with subscripts 4, --- , % 
and no others. To these must be added the set D, for which none of the popula- 
tions is selected. If the number of available populations is a, the number of sets 
D is 2% since each subscript may or may not occur with all combinations of the 
remaining subscripts. 

Fortunately, selection procedures possess an equivalent, and for most pur- 
poses much simpler, representation. Let FE; be the set of sample points for which 
the ith population is included in the selected group. Then each of the two systems 
of sets {D} and { F} is uniquely determined by the other. In fact, EZ; is the union 
of all those sets D which have 7 as one of their subscripts. Conversely, 


Di, 4 = By N---NBNE,N---N#,. 


where ji, °** , Jax are the subscripts different from 7, --- , % and E denotes 
the complement of E. Instead of working with the sets EZ; in the enlarged sample 
space, it is now more convenient to return to the possibility of randomization. 
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Each £; is then represented by a function y; defined over the sample space and 
taking on values between 0 and 1, where (2) denotes the probability with which 
the 7th population is included in the selected group. A selection procedure is 
characterized by the vector y = (W,--- , Wa).4 

According to the formulation of the preceding section we are concerned with 
a minimax problem subject to side conditions. This type of problem was in- 
vestigated by Blyth [31] but his conditions do not apply to the cases to be con- 
sidered here. The following lemma is an immediate extension of the standard 
method of characterizing minimax solutions as Bayes solutions corresponding to 
a least favorable a priori distribution. As in its more usual form, it is essentially 
an application of the Lagrange method of undetermined multipliers. 

Lemma |. Let @ be a o-field of subsets of the parameter space Q and let \ and u 
be probability distributions over (Q, @). Let A, B be two positive constants and let 
59 maximize the integral 


(9) B | S(0, 6) dell) A [ R(@, 8) dy(6). 


Then 5) minimizes sup R(6, 6) subject to 
(10) inf S(@, 6) 


provided 


(11) | R(6, 59) dd(@) = sup RG, 5) 


and 


(12) [ sve, 5o) du(@) = inf S(8, 50) = ¥. 
If 5 is the unique procedure maximizing (9), it ts also the unique solution of th 
restricted minimax problem. 

Proor. Let 6 be any procedure satisfying (10). Then 


e 


B | S(0, 8) du(@) — A R(6, 8) dr(6) 


< B | s(o, 5) du(@) — A [ Rv, 5) d\(@) = By — A sup R(8, &). 


‘This representation was first utilized in a slightly more special form by Robbins [23] 
A generalization was given by the author in Theorem 1 of [30]. I am grateful to Professor 
L. LeCam for pointing out an error in the generalization of Theorem 1 to randomized pro- 
cedures. The displayed equivalence formulae at the top of p. 6 of [30] are not correct. How- 
ever, the equivalence theorem itself remains correct even when randomization is permitted. 
This can be seen as above, by representing a randomized procedure as a nonrandomized 
procedure in an enlarged sample space and applying Theorem 1. 
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Since 6 satisfies (10), it follows that 


sup R(@, d) S | Re, 5) d\(@) S sup RG, 5) 


as was to be proved. 

If condition (10) is to hold only when the sup is taken over a subset 0’ of Q, 
\ must be a distribution over 2’ but no other changes are necessary. 

As is usually the case with minimax problems, the more difficult part of the 
solution is not the maximization of (9) but the determination of an appropriate 
\ and u. In this connection, the following standard devices are helpful. 

1. Condition (11) implies that \ assigns probability one to the set w of param- 
eter points 6 for which 


(13) R( 6, é&) = Supe’ Ri 0’, bo). 
Similarly, 1 must assign probability one to the set for which 
S( 6, 60) = inf, S( 6’, 50). 


2. The pair (A, ) is least favorable in the sense that it minimizes the maximum 
(with respect to 6) value of (9). 

3. If the problem exhibits any symmetries, it pays to look for distributions 
A, w possessing the corresponding symmetries. 

We shall in the following consider procedures, which determine the selection 
or nonselection of the 7th population on the basis of real-valued statistics 7; , 
and in particular we shall prove certain minimax properties for procedures of 
the type 


(14) ¥; = 1,2r,0 as T;>,=,<C;. 


For this purpose it is convenient first to state the following lemma, the proof 
of which is immediate. 

LemMa 2. Suppose that the distribution of T; depends only on 6; and is stochasti- 
cally increasing in 0; . Let5 = (Wy, +--+ , Wa) be any procedure satisfying (14) and 
let I be the set of subscript i for which 


Eoj4a Vi = MiNjar,....0 Loy +s Vj - 


Then, if S is given by one of the quantities (a), (c) or (d) of Section 3, info S(@, 5) 
is attained at all points @ such that for some i € I 


6:=O%+A and 6;<&%+A for all j # i. 


If S is given by (b), info, S(6, 6) ts attained at all points 6 such that for some subset 
ras ae 
it, > ky 2 


6,,= °°: =0,=%+A and 6;< +A for the remaining @’s. 
If R is given by (i) or (ii), then supe R(@, 6) is attained at the point 


"a = (60, os 0). 
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We note also that if in addition to the assumptions made above, the joint 
distribution of (7; , --- , 7.) is stochastically increasing in (6@,, --- , 6.), and if 
S is given by (e) of Section 3, then info S(6@, 5) is attained at the 
point (% + A,--- , 6+ A). 

The minimax solution can now be obtained under the following conditions. 

TuHEoreM. Let the probability density of X be denoted by po when 6, = --- = 
6. = 6 , and by p; when 0; = 6 + Aand the parameters 6; for 7 # i have a com- 
mon value @ S 6 + A determined so that the conditions below are satisfied. Suppose 
that p:(x)/po(x) is a nondecreasing function of a real-valued statistic T; , that the 
distribution of T; depends only on 6; , is stochastically increasing in 6; , and is inde- 
pendent of 1. Then the procedure 5o satisfying (4) and (5) with S equal to any one 
of the quantities (a)—(d) and R defined by (i) or (ii) of the preceding section, is 
given by 


(15) vi = 1,2, 0 


where » and C are determined by 

(16) Eon+a¥i = ¥- 

The solution of the dual problem in which (4) is replaced by 
(17) sup R(6,6) S vy’ 

is also given by (15), with X and C now determined by 
(18) R(6°, 5) = y’ 


where 0 = (0), --- , 0). 
PROOF. 
1. Let uw be the distribution which assigns probability one to the point 


(0), --* , %) and \ the distribution which assigns probability 1/a to the points 
6” given by 


(19) 6; = 0 + A, 6,;= A <H+A forj ¥ 7%. 


Then it follows from Lemma 2 that 59 satisfies conditions (11) and (12). 
2. For the distributions \ and uw specified in 1, and with p,; denoting the 


“nis ° _ a 
probability density of X when 6 = 6°’, we have 


(20) R(6’,5) = / (vit --- + ¥e)Po;  S(6", 8) = [ vn. 


and hence (9) reduces to 


(21) BS | un. i > [vm = [Xv (2 ». - Ap). 


i=1 
Since 0 < y; S 1, (21) is maximized by putting y; = 0 or 1 as 
(B/a)p; < or > Apo, 


and hence as 7; is < or > C, as was to be proved. 





998 S. L. LEHMANN 


We note that it is actually not necessary for p;(2)/po(x) to be an increasing 
function of 7’; , but only that there exist a constant k such that the regions 7; > C 
and 7; < C (for the particular value C determined by the side conditions) are 
equivalent to the regions p;(x)/po(2) > k and < k respectively. 

This theorem provides the basis for determining the sample size necessary to 
control the risks R and S at any desired levels. For suppose that we wish the 
selection procedure to satisfy 


R(6,6) Sy’ forall@eQ 


and 


S(0,6) = ¥ for all 0e . 


Then for the smallest sample size (possibly randomized) which constitutes a 
solution to this problem, the associated procedure 6) minimizes sup (6, 6) sub- 
ject to (4). If the conditions of the theorem are satisfied, 59 is therefore given by 
(15) and hence satisfies (if we assume for simplicity that it is nonrandomized) 
aP,,(T; 2 C) if Ris given by (i) 

sup R(6, 50) = R(@ , 50) = : ; a 3 ‘ oat 
Po (T; 2 C) if Ris given by (11). 


It further satisfies the condition 


inf S(6, 60) = Pajsa(Ti 2 C). 
If we let 
when R is given by (i) 


when R is given by (ii), 
the sample size is therefore determined by the conditions 
Po, ( T = C) = Po, +a Ti = C) = Y- 


These are exactly the conditions appropriate for testing the hypothesis 6; = 4 
against the alternative 0; = 6) + A if we wish to have significance level y* and 
power at least y. In all particular cases considered in the following section, the 
sample size determination therefore reduces to a problem whose solution is known 
from the corresponding problem of hypothesis testing. 


5. Families with monotone likelihood ratio. The theorem of the preceding 
section applies directly to the case of independent samples Xj, , --- , Xi, from 
populations with probability density f,, depending only on the real-valued 
parameter 6; with respect to which we wish to select, if there exists a sufficient 
statistic T; for (Xi, --- , Xi.) with monotone likelihood ratio. Let the proba- 
bility density of 7; be ge, , and take 4 for the value 6’ of the theorem. Then 


PAL) — Goysralti) Il Joo (tj) _ Gag+a(ti) 


Pr (x) Je, | t; ) ji Je, ' t;) Je, (t; ) 


j 


which is nondecreasing in ¢; , and it follows that the minimax procedure is given 
by (15). 
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In particular, if nm = 1 and the probability densities f,,(2;) have monotone 
likelihood ratio in x; , the result holds with T; = X; so that the procedure is 
given by 


(22) ¥; = 1,%,90 mach, =. &e. 


Examples of this are the case in which the random variables X ; are independently 
distributed with binomial distributions b(p;, m), with Poisson distributions 
P(7r;), or more generally with distributions having densities of the form 


C(0;)e"*”*h(2,), 


that is, belonging to an exponential family. 

As another example, consider samples (Xj, --- , Xin) from normal distribu- 
tions N(£,;, o;) and suppose we wish to select the populations with small vari- 
ances. Attention may be restricted to the sufficient statistic X,, --- , X, and 
Si, --- , Si where 


n n 
X= > Xun; S => (XxX, — F,)*. 
j=l j=1 

Since the problem remains invariant® under addition of arbitrary constants c; to 
X;, , it follows from a trivial extension of the Hunt-Stein theorem ([32], p. 336) 
that there exists a minimax solution depending only on the variables Sj , --- , Sz . 
To these variables the theorem is now applicable with n = 1, and 
shows that the minimax procedure consists in selecting the populations for 
which S; < C. 

Suppose that instead it is desired to select the populations for which the 
parameters 6; = £;/o; are sufficiently large. This time the problem remains in- 
rariant under multiplication of X; by any positive constant c; and of S} by ci. 
There exists therefore a minimax solution depending only on the variables 
T; = X,/S;. Since the T; have noncentral ¢-distributions which possess mono- 
tone likelihood ratio, it follows that the minimax procedure consists in selecting 
the populations for which X,/S; = C. 

If in this last problem the variances o; are assumed to be independent of 7, 
with the common variance o? still being unknown, the problem (of selecting for 
t,/o or €;) surprisingly is much less simple since the 7; are then dependent. We 
shall consider this case in Section 7. 

We conclude the present section by showing that in all the problems con- 
sidered above, if criterion (e) of Section 1 is used instead of one of the criteria 
(a)-(d), the minimax solution is no longer given by (15). The argument is 
sufficiently clearly indicated by considering the case a = 2. Suppose therefore 
that X, , X2 are independently distributed with densities f,,(x) and fo,(z) which 
have monotone likelihood ratio. We shall also assume that the associated cumula- 
tive distribution functions are continuous; if the region of positive density is 


5 In the same strong sense as in the theory of hypothesis testing that is without perform- 
ing any transformations of the decision space. 
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independent of @, this can always be achieved by adjoining a uniformly dis- 
tributed random variable. 
Subject to 


(23) sup [Expected number of false positives] < 7’ 


we wish to maximize the minimum probability of including in the selected group 
all good populations (if there are any). The procedure of the preceding section 
includes the ith population when X; = C where P,,{X; = C} = y'/2. Then the 
maximum expected number of false positives is exactly y’ and, if 


B = Poy 4a( Xi Pos C), 


the minimum probability of including all good populations is #”. 
Consider now the following alternative procedure: 
Include @, in the selected group 
if X, => Cand if X,< C —eorX,2C. 
Include 62 in the selected group 
if X. => C and if X; < C — eor X; = C. 

In addition include both 6; and 0, if C — e S X,, Xo < C. 

For e sufficiently small, it is then easily checked that the probability of includ- 
ing 6; in the selected group when 6; < 6 is still < y’/2 so that the procedure con- 
tinues to satisfy (23). The probability of including in the selected group all good 
populations if only one of the @’s is 2 6 + A is now less than its previous value 
8 but by continuity is > §° for e sufficiently small. On the other hand, the pro- 
bability of including all good populations when both 6; and 6. are = @) + Ais 
now clearly > #° since the set of sample points for which both populations are 
included has been increased by the set C — e S X,, X, < C. Hence for « suf- 
ficiently small, the minimum probability of including all good populations is 
now > # as was to be proved. 


6. Unequal sample sizes. The case of unequal sample sizes requires a slight 
generalization of the theorem of Section 4. 

THEOREM. Let T; (¢ = 1, --- , a) be a real valued statistic whose distribution de- 
pends only on 6; and is stochastically increasing in 0; . Let 5 = (Wi, +++ , Wa) be 
the selection procedure defined by 


(24) ¥; = 1,2;,0 as T; >, =, OF 
where d; , C; are determined by (16). Let p;(x) and po(x) be defined as in Section 4 
and suppose that there exist constants k; such that 

p:(x)/p(z) >, =, < k; an T; >, =, < C;. 


Then, if S is equal to one of the quantities (a)—(d) and R is defined by (i) or (ii), 
do minimizes supen R(O, 5) subject to (4). If instead of by (16), the constants X; , C; 
are determined by (18) and 

(25) E4,+a¥: is independent of 7, 


do maximizes info S(O, 5) subject to (17). 
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Thus in both cases, the critical functions y; , which can be interpreted as tests 
of the hypotheses H; : 6; S 4 against the alternatives K; : 0; = 0 + A, are not 
determined to give a constant (independent of 7) significance level but instead 
so that the minimum power against K; is constant. 

The reason for this is clear: the probability of the 7th population giving rise 
to a false positive takes on its maximum value at the same point 6 for all ; 
hence the contributions of the various populations to the expected number or 
proportion of false positives are combined and do not have to be controlled 
individually. On the other hand, the probability of the ith population resulting 
in a true positive takes on its minimum value at a different point 6 for each 7. 
For the minimum of these minima to be a maximum, they have to be equal. 

The lack of symmetry shown by the solution is of course a consequence of the 
asymmetric formulation of Section 3, where the consideration was shifted from 
false negatives to true positives. However, this asymmetry is not artificial but 
only reflects a corresponding asymmetry of the problem. 

The proof of the result for unequal sample sizes parallels that of the special 
case when the sample sizes are equal. Let 5) be the procedure determined by (24), 
(25) and (16) or (18). Let the distribution » be defined as before but let \ assign 
to the point 6’, instead of 1/a, a probability x; to be determined later. The quan- 
tity (21) then becomes 


(26) [ XCverp. — Apo) 


which is maximized by putting y; = 0 or 1 as 
pi/po < or > A/Br;. 


Hence if B/A and z; are determined so that A/Br; is equal for each 7 to the 
constant k; defining 49, it follows that 49 has the desired minimax property. 


7. Normal populations with common unknown variance. Let t Xi; G = 1, 7 
n;;t% = 1, --- , a) be normally distributed as N (Ei ,o), and suppose we 
wish to select the populations with large values of £;/a0. More specifically, we shall 
consider a population as negative if £;/o < 0 and as positive if &;/o = A. A set 
of sufficient statistics is given by the means X, together with 


= >) > (Xi — X,)*. 


Since the problem remains invariant under multiplication of each X; by the same 
positive constant ¢ and of S’ by c’, there exists a minimax procedure depending 
only on the variables Y; = X,/S. If 6; = &;/o, the joint density of the Y’s, is 
up to a constant, 


Diy, °°, Yo) = (1 + Dany” 


(27) es saint 
[ wt exp[—w/2 + wLinibiyi/V1 + Son, y7) dw, 
Jo ‘ 
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where 


a 


j= » (n, -- 1). 
i=] 
From a consideration of the individual hypotheses H;: 6; < 0, it appears 
natural to include the 7th population in the selected group if 


(28) X/S = C;. 
For this procedure it follows as before that the expected number of proportion of 
false positives takes on its maximum value at 6°’ = (0, --- , 0) while the quan- 
tities (a)—(d) take on their minimum value at (among other points) the points 
eo = (0,---,0, 4,0, --- , 0). 

If the joint density of the Y’s at 6’ and 6" is denoted by po and 7, , it is seen 
from (27) that p;/po is an increasing function of 


ial cies key Sinaia 
Vi + >>; Y} VS +2 m5; 
The selection of the ith population when p,/po is sufficiently large thus leads to 


selecting the populations for which 


X; 
V3 + > n; X; 


This corresponds to the solution proposed by Paulson [15] and Pfanzagl [18] for 
the associated slippage problem in which the standard is replaced by a control.° 
It is however not a solution to the present problem. For as 6; - — ~ for some 
j # i, the probability of the inequality (29) tends to zero, and so therefore does 
the minimum value of each of the quantities (a)—(d). 

Sometimes it is not unreasonable to assume a priori that 


(29) 


> C;. 


(30) é,;=0 for all 7, 


If for example we wish to select among a number of possible enrichments of a 
substandard diet, we may be willing to assume that the effect of each, if any, is 
beneficial. While under this assumption, for most significance levels and sample 
sizes, the performance of the procedure (29) is no longer as drastic as before, the 
procedure is nevertheless still not a minimax solution. This is easily seen from 
the fact that the probability of the inequality (29), subject to 6; = A and 
0 < 6; S A for allj # 7 takes on its minimum value not at 6” but instead at 
the point 6, = --- = 6,. The proof is similar (but simpler) to the one given 
below. 

The question arises whether the intuitive procedure (28) is, as one might ex- 
pect, a minimax solution. We shall prove in the next section that this is not the 
case, under the a priori assumption (30). It seems likely that the situation is the 

* A slippage procedure, corresponding to (28) was proposed by Paulson in [9]; the com- 
plete procedure (28), with the standard replaced by a control, is discussed by Dunnett [33]. 
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same even without this assumption. However, as we shall now show, (28) is 
then at least approximately minimax in the sense that subject to (4), the maxi- 
mum expected number of false positives can at least be improved only slightly 
over its value for (28). We shall show this with the constant C; determined by 
(25). 


Condition (4) implies that for all 7, 
(31) Ey; 27 when 6;2 A and 0;<A for) # 1. 


Since the procedure (28), which for the moment we shall denote by ¥* = 
* . ° o,¢ 

(yi, ---, Wa), attains the maximum expected number of false positives when 

6, = --- = 6, = O, the minimax value of this quantity can be no higher than 


Ey.... (vt +--+ + yr) =aj+::: oe Qa 


where 
a; = Po X/S > Ci, 
C; being determined by Ew? = y. 

On the other hand, this minimax value cannot be much lower than >> a; . 
For consider any procedure ¥y = (¥, --- , Wa) satisfying (31). Its maximum ex- 
pected number of false positives is greater than or equal to Zo,....o(yi + --- + Wa). 
Consider now the problem of minimizing £p.....o(¥1 + --: + Wa) subject to 
(31). If we restrict (31) to the parameter values 6; 2 A and 6; = 0 forj ¥ i, 
the solution becomes 


¥; = 1 when x, / V S? + > nj Xj > C;. 


j#t 


Thus >. a; where a; = Po....,o[ Xi//S? + 2a nj;Xj; > C’} constitutes a lower 
bound for the sought for minimax value. However, for typical values of the sam- 
ple sizes, a; will be only slightly lower than a; , the only difference being the a — 1 
added degrees of freedom in the denominator of the t-statistic, which now has 
>> n; — 1 degrees of freedom instead of the >> (mn; — 1) in the case of (28). 

The same argument shows that (28) approximately minimizes the maximum 
expected proportion of false positives. 


8. A counterexample. We shall now construct, for the case of equal sample 
sizes, a procedure satisfying (23) and with larger minimum value of (a)—(d) than 
(28). To this end we shall first prove the existence of points (y:, --- , ye) and 
(yi, *** Ya) with y, < C < y; and such that 


PA 00,---.0a(Y15 tee Ya) > Pa 2 .-- 0a YA ree Ya) 
Po,0,---,0\Y1y *** » Yad Po,0.---,0(Y1>°** » Ya) 


for allO S 0, ---, 6, S A. It is seen from (27), that the probability ratio 
Ps .6>,---,,/Po,---,0 iS an increasing function of 


(32) ant Daw) / 4, 1 +n), yj. 
j=2 


j=l 
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Hence it is enough to construct the points y and y’ in such a way that the expres- 
sion (32) (where without loss of generality we can put n = 1) exceeds the cor- 
responding expression when y is replaced by y’. Putting y2 = --- ye,¥2 = -°°* = 
Yo , letting p = (@ + --- + 6,)/A and writing k for a — 1, we must find pairs 
(1 , Ye), (yi ‘ Y2) with y, << C < Yi and such that 


os yr + pyr ms lial yi + py? 
Vityt@—Dy~ Vit y?t+ @— Dy? 


Since all coordinates will be chosen to be nonnegative, the inequality can be 
squared and on collection of powers of p becomes 


forall0 <= pk. 


f(p) = ap + 2ap + a > 0 
with 
= y(1 + yr) — ys (1 + yi) 
yryell + yr + (a- 1)y3"] _ yiya(l + Yi +(a- 1)y3] 
= yi{l + (a — 1)y2"] — yr [1 + (a — 1)y3). 


A sufficient set of conditions for f(p) to be positive for allO0 < p S kis a < 0, 
f(0) > 0, f(k) > O, and hence 


a < 0, a, > 0, ak” + 2a,k + a, > 0. 


For any fixed y; , y; and ys , the first two of these conditions are satisfied if y} is 
sufficiently large. The coefficient of y;” in ack” + 2a,k + a is — k°(1 + yj) + 
2kyvye(a — 1) + (a — 1)y} . If ye is chosen large enough so that this coefficient 
is positive, the third condition is also satisfied for y; sufficiently large, and this 
completes the proof. The two points constructed in this way will be denoted by 
y’ = (yi, °°: , ye) and y” = (y{', ---, y’’). We note for later use that the 
points can be chosen in such a way that C < y;, y; for all j > 1. 

Let R and R’ denote two spheres with centers at y° and y”, and radii deter- 
mined so that 


(33) P(R) = P(R’) when 6; = --- = 6 = 0. 


In addition, the spheres are to be sufficiently small so that 


nt Dey w+ LD ovy; 
(34) — > — 


Vitdy Yitdy 
j=1 fal 
that y, < C < yj} forall y e R and 7’ ¢ R’ and that further conditions are satis- 
fied which will be specified later. 
, Consider now the following modification of (28): 
the 1st population is selected if the sample point satisfies 


forallO < 


(35) (ye R) or (nw >C and yeR’), 


and the rule for selecting the other populations is defined by symmetry. 
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By (35), the expected number of false positives, when 6; = --- = 0, = 0, is 
the same for the modified procedure as for (28). Since under assumption (30) 
the ith population can give rise to a false positive only when 6; = 0, it follows 
that the modified procedure satisfies (23) if R and R’ are sufficiently small. 

Consider on the other hand one of the criteria (a)—(d), for example the proba- 
bility of including the best population in the selected group when its 6-value is 
= A. Under (28) this attains its minimum value at the points whose coordinates 
for some 7 satisfy 06; = A and 0 S 06; < A forj # i. By (34), the probability 
of including the best population is larger at all these points under the modified 
procedure than under (28). 

In order to prove that also the minimum probability of including the best pop- 
ulation has been increased by the modification, it is sufficient to show that 
with the modified procedure this minimum probability is still attained at points 
satisfying #; = A,0O S 0; < Aforj # 7. This follows easily if we can show that 
for the modified procedure the probability of including the ith population is an 
increasing function of 6; for fixed values of the other 6’s. This result finally is an 
immediate consequence of the following two facts. 

1. The partial derivative 


0 = 
— Po{X/S >C 
36, 0{X/S > C} 
is positive for all 6; and is bounded away from 0 in any finite interval a S 6; S b. 


2. As the radius of the sphere R (and hence also of R’) tends to zero, the de- 
rivatives 


a a ! 
a0, Po,,---,0,(R) and 30, Po,,.--0,(R ) 


tend to zero. 

PRoor. 

1. Putting o = 1 so that &; = 0; , the derivative is equal to 

of” c i 

=~ Po,.{X4 > Cs} dP(s). 
We can differentiate under the integral sign and the derivative of the integrand 
is known to be positive. (See for example [32], p. 114. Problem 18.) Since this 
derivative is a continuous function of 6; , it is bounded away from zero in any 
finite interval. 

2. Writing P(R) = fr Po,.....0,(y) dy, the differentiation can be carried out 
under the integral sign. Since for (@,, --- , 0.) in any finite interval, the inte- 
grand is uniformly bounded, the result follows. 

Exactly the same argument applies if criterion (d) is replaced by (a) or (c). 
However, with (b) the difficulty arises that the expected proportion of true posi- 
tives takes on its minimum value at all points which for some 1 S 7% < --- < 
i Sa,l1 sk Sa, satisfy 


0:, ---=60,=A; 050;<A forall] ¥t,---,%. 
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To obtain an increase at all these points, we note that at the beg al ie of the 
section we proved the existence of points y = (#1 ,°--* , Ya) andy’ (yi a , Ya) 
with 


m<t < Yi and C < y;, Y; for allj ¥ 1 


and such that (34) holds. 

Let R, and Rj denote the spheres previously denoted by R and R’, let R; and 
R’; be defined by symmetry. The modified procedure as before consists in includ- 
ing 6; in the selected group if 


(ye R;) orif (y;>C and yeR;) 


For this procedure it was shown previously that if 0; = A,0 S 6; < Aforj # i, 
the expected proportion of true positives has been increased by the modification. 
Suppose now that two of the 6’s are equal to A, say 6; = = AOS 0; <A 
for 7 > 2. Then twice the expected proportion of true positives equals 


*tselecting 6} + P{selecting 4}. 


Since for the points (A, A, 6; , --- , 0.) withO < 6; < Aforj > 2 we have both 
P(R,) > P(R}) and P(R:) > P(R:), it is seen that the expected proportion is 
increased also in this case, and in the same way that it is increased at all points 
at which it takes on its minimum under (28). The remainder of the argument 
requires no change. 

In conclusion we mention, without going into details, that even without the 
restriction (30) the procedure (28) is not the solution of the problem of minimiz- 
ing the maximum expected number of false negatives subject to 


sup [Expected number of false positives] < 7’ 

This follows more simply but by the same method as before from the fact that 
the expected number of false negatives under (28) takes on its maximum at the 
single point 6; = --- = @ = A. 


9. Decision theoretic approach. Although the formulations of Section 3 appear 
to the author to be more useful for most applications, the problems can also be 
treated from a purely decision theoretic point of view, with general loss functions 
replacing the consideration of true and false positives. In the present section 
such a treatment of the problems of Sections 5 and 6 will be sketched very briefly. 

Suppose that X;,, --: , Xin, (¢ = 1, --+ , @) are independent samples, that 
the distribution of the 7th sample depends only on the parameter 6; and that we 
wish to select the populations with high 6-values. Let the loss resulting from the 
selection or nonselection of the 7th population depend only on 6; and be denoted 
by L,(0;) and L{(6,) respectively. Finally, let the over-all loss be the sum of the 
individual losses. 

Consider now the ith component problem, a two-decision problem for the 
parameter 6; with losses L; and L; . Suppose that the minimax solution y, for 
this problem is a Bayes solution with respect to a least favorable a priori distri- 
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bution A; of 6;. Then y; is also the Bayes solution for this same two-decision 
problem on the basis of all >= n; observations with respect to the a priori dis- 
tribution Ai(6:) X A2(G) XK +++ X& Aa(6.) for the combined parameter 6 = 
(0, +++ , 0). This is an immediate consequence of the fact that the a posteriori 
(marginal) distribution of @; given all the - n; x’s depends only on 21 , «+ in, . 
It then follows from result (ii) (on p. 15) of [30] that the selection procedure 
(vy) , --- , Wa) is a minimax solution of the over-all problem. 

Conditions under which the minimax solutions for the component problems 
are of the form (24) are given for example in [34] and [35]. The minimax property 
of procedure (24) in these cases is a slight generalization of a result of Robbins 
[23] and Hannan and Robbins [36], which was established there by quite different 
methods. It is suggested by these papers (see also Johns [37]) that for the problem 
under consideration there exist asymptotic subminimax procedures so that for 
large a, certain improvements over the above minimax procedure may be possible 


10. Comparison of normal means with a control. In the remaining two sec- 
tions we shall be concerned with problems in which the quality of the standard 
is not assumed known but where instead a control group Xo; (j = 1, --- , m) is 
observed in addition to the observations X;; (7 = 1, --- , 3% -*+,a) on 
the a treatments. 

We consider first the case that the X;; are independently distributed with 
normal distributions N(é;, oi) fori = 1, --- ,a and N(&, o>) fori = 0, and 
assume to begin with that 0} , oj are known and that n; = n fori = 1, --- a. 
The averages Xo, X,, --- , X. are then sufficient statistics, independently dis- 
tributed with normal distributions N(£ , 7>) for Xo and N(é;, 71) for X; (i = 

, a) where 


, 


Tr) = 09/m and 7; = oj/n. 

As in Section 5, a slight generalization of the Hunt-Stein theorem pe: rmits a 
reduction of the data. We may restrict attention to the variables Y; = X; — X, 
since by this theorem there exists a minimax solution which is ivvestant under a 
common translation of all variables and since Y,, --- , Y. constitute a maximal 
set of invariants with respect to these transformations. 

Putting 6; = & — &, the conditional joint density of the Y’s given X) = 2 
is (up to a constant factor) 


exp{ _ aol l(y; — 0) + (a - wr, 


The joint density of the Y’s is therefore 
c few{- 4 a3 DA I(y; — 4) + yol” 
which after some simplification becomes 


( 1 
(36) ply) = Caps- oF? P (y; — 0;)° —- 
er} 


ari + rj 
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We shall now apply the theorem of Section 4 with 4 = 0 and 6’ = pA. Then 
pi(y)/po(y) is the ratio of the two densities (36) for 6; = A, 6; = pA(j ¥ 7) and 
6, = --- = 6, = 0. The quadratic terms in the exponent cancel and the linear 
terms are, up to a factor A/7j , 


a’rs 1+ (a—I1)p_ 
tu — = 9 > 7 ~ 
ed Wtu- ae - 


= (1 — p)y; + aj { — 


tt + (@- veh 


ar vy 7 


0 
(37) = — 5 
5 + ri 


the coefficient of ag vanishes, so that 


se fA(l — 
pily)/poly) = ¢ exp { m e) Yi 


\ Ti ) 


is an increasing function of y; . For this value of p, the conditions of the theorem 
of Section 4 are satisfied and the minimax procedure (15) thus reduces to 


(38) ¥i=1 when y; = X¥;— Xo>C, 
where C is defined by 
0 
7 —_ 
V m mn 
This solution is easily extended to the case of unequal sample sizes and un- 


equal variances. If the variance of X; is 7; we find for the joint distribution of 
the Y’s, 


(39) Pi(X; —_ X >C) #=1-=+-¢6 


p(y) = wi DE — 6)? + ; . ee ~ ri) 


We now apply the theorem given in Section 6, this time with 6’ = 6; = p,A, so 
that p.(y) is the density of the Y’s when 


6; = piA for j # 1. 


Then the quadratic terms in the exponent again cancel in the ratio p,(y)/po(y), 
and the linear term is 


(2 1 1 - Yj 6; 
D5; se r + + ® 2 2; “2 z. 2 
0 Tk Tj T 
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all sums extending from 1 to a. For j ¥ i, the coefficient of y; is (up to a factor 
A[(1/75) + QU (1/sB)T") 


pil | 1 1 1 1 
*(a+Dal-slatez, al: 


This will be zero if p; = 75/(75 + 77) and the coefficient of y; then becomes 


a[ at 2a-po]. 


T ji Tj 


Since this is positive, p;(y)/po(y) is then an increasing function of Y; = X; — Xo 
and the minimax procedure is therefore given by (24) where C; is given by (39) 
with n; in place of n, or by (25). 


If the variables Xi; (g = 1, --+, 5% 0, --- , a) all have common but 


unknown variances, it follows as in Section 7 that the procedure given by 


(Be 207 + i 


4/5 ¥ (Xa — 8)" 


j=0 k=l 


(40) ¥; = 1 when >C 


is approximately minimax, where mo replaces the earlier m. 


11. Comparison of normal variances with a control. Let Xi; (j = 1,---, ni; 
i = 0, --- , a) be independently distributed with normal distributions N(é; , 07) 
(€;, 0; unknown), and consider the problem of selecting the populations for 
which o;/o> < 6. Application of the generalized Hunt-Stein theorem proves the 
existence of a minimax procedure depending only on the statistics 


= 2 (Xu — ¥,)’. 
j=l 
We may therefore restrict attention to Sj, --- , Sa where the distribution of 
Si/o; is x7, with f; = n; — 1, and by another application of the same theorem 
to the variables V; = S{/Sj (i = 1, --- , a). The joint densities of the V’s, is 
up to a constant factor 


> v; (Sottit:+++Sa—-1)/29-1 Il py, fi-2)/2 
1+ 3) | Tae TaN 
( a3 /0 pat (203/00)"*"T(f,/2) 


Let us now apply the theorem of Section 6 with 0; = 06/0; , with 0 = 1 (so that 
the conditions @, = --- = 0, = 6) are equivalent to oj = --- = 0% = 00); with 
1+ A = 1/6 (so that the conditions 6; = 0 + A = 1 + A is equivalent to 
oi/o5 < 6); and & = p,(1 + A), where p; < 1 is to be determined later. With 
these values, the probability ratio p;/po is an increasing function of (1 + >. v;)/ 


(1+ >> 695) 
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Let us therefore consider the region 


1+ div, = kl + D 6p) 


which is equivalent to 


k(1 — pi)(1 + Advi S (1 — ki) + [1 — piki(l + A)] Do 0;. 


j=l 


If we put p; = 1/k;(1 + A), this reduces to 


l—k 
- he p:)(1 + A) 


or equivalently to v; < C; with 


os l — k, 
k(1 + A) -—1- 


i= 


As k; goes from 1/(1 + A) to 1, C; goes from ~ to 0, and for these values of 
k;, it is seen that p; < 1. The theorem of Section 6 therefore shows that the pro- 
cedure y; = 1 if v; S C; has the desired minimax property. 

As a last problem, consider a set of Poisson populations with Poisson param- 
eter Ao, 1, -** , Aa. The problem of selecting the populations for which \;/A». = 
1 + Ais not meaningful in the formulation given here since the parameter pairs 
(Xo, Ac = Ao) and (Ao, Ax = (1 + A)Ao) become indistinguishable as \) > ~. 
The problem could be treated with the minimax principle replaced by a suitable 
unbiasedness principle. Alternatively, if one is concerned with Poisson processes, 
one may instead of observing the number of occurrences in fixed intervals, take 
as observations the times required to get a specified number of occurrences. 
These times then follow gamma distributions, and the solution of the present 
section is directly applicable. 
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BAYES RULES FOR A COMMON MULTIPLE COMPARISONS PROBLEM 
AND RELATED STUDENT-: PROBLEMS' 


By Davip B. DuNcAN 


University of North Carolina? 


0. Summary. The paper is mainly concerned with the following multiple com- 
parisons problem in the analysis of variance setting. In a balanced experiment n 
treatments are to be compared. Each of the n(n — 1) pairwise comparisons is 
to be made, adjudging each difference as “positive”, ‘“‘negative’’, or ‘‘not sig- 
nificant’’; overall decisions involving intransitivities are barred. The loss for 
each difference is proportional to the error; if a difference is asserted incorrectly 
the loss has proportionality constant c, , if “‘not-significant”’ is the incorrect con- 
clusion the proportionality constant is co ; where c; = k; + ko, co = ko and 
k, > ko > 0. Total loss for the experiment is taken as the sum of the $n(n — 1) 
component losses. The Bayes rule for any prior distribution is shown as a result 
to consist in the simultaneous application of Bayes rules to the $n(n — 1) 
component problems. Each of these in turn is shown similarly to consist in the 
simultaneous application of Bayes rules to two subcomponent problems. The 
subcomponent Bayes rule for a normal prior density of treatment means is ex- 
plicitly derived. The dependencies of the solution on the variance of the prior 
density, the degrees of freedom and the loss ratio k,/ko are discussed. A principal 
finding is that the Bayes solution for the multiple comparisons problem corre- 
sponds to a tolerated error probability ‘‘of the first kind’ for each single differ- 
ence, that is independent of the number of treatments being compared. 


1. Introduction. Many procedures have been proposed for the multiple com- 
parisons problem herein considered. These include, for example, a “least-signifi- 
cant-difference”’ rule due to Fisher [5], an “honest-significant-difference” rule 
due to Tukey [17], [18] and multiple range testing procedures due to Newman 
[10], and the author [3]. Some of these have also been described in recent texts 
such as Federer [4], Li [9], Snedecor [13], Scheffé [12] and Steel and Torrie [14]. 
With much help from the recent more general work of Lehmann [7] it has now 
been possible to solve a Jeffrey’s-like Bayes formulation of the problem. This is 
more complete than any of the previous formulations and leads to a simple solu- 
tion with properties that are better defined and that appear to be appropriate 
to an appreciable class of practical situations. In the process, similar Jeffrey’s-like 
Bayes formulations and their solutions are presented and obtained for two com- 
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mon types of Student-t problems. These are developed first as problems of sepa- 
rate interest in Sections 2 and 3. The main problem is then fully developed in 
Section 4. A discussion of the more applied aspects, together with further illus- 
trations is planned for a paper to be submitted to Biometrics. 


2. A two-decision Student-t problem. Given a random observation ¢ from a 
non-central ¢ distribution with non-centrality parameter 7 and v degrees of 
freedom, a common problem is that of choosing between the two decisions 


(3.4) dy:decide r = A and d,:decide 7 > A, 


where A is some unspecified positive boundary value. In the language of the ex- 
perimenter, dy is the decision that 7 is not significantly greater than zero and d; 
the decision that it is. In the theory of hypothesis testing the same problem is 
often more loosely regarded as that of testing Ho:r < O with the alternative 
H,:r > O, the decisions thus being 


(2.2) dy:decide 7 S 0 and _ d,:decide r > O 


Strictly speaking however the null decision does not deny the possibility of posi- 
tive though relatively small values for 7 and some such formulation as (2.1) is 
more precise. The change is relatively trivial in this problem by itself. It is essen- 
tial however to our subsequent developments as is brought out shortly after 
(4.12). (See Lehmann [7] also for a similar change). 

Our first result is a Bayes rule ¢(t) for this problem with respect to a simple 
linear loss function 


0, T 


kor, T 


Lo(r) _ L(r, do) -{ 


ky |r| , . 
[y(r) = L(r,d,) = ‘0 \7| : as 


where kp and k, are positive constants such that k; > ko , and with respect to a 
normal prior density for r, 


¢ 2\- —}72/+42 
(2.4) E(r) = (2a) te", —x <r<o, 
with mean zero and variance 7’. The rule is of the common form 


( 
_ fd, t<te, 
s(t) "4. e: 


where $(t) = 0 or 1 is the usual indicator function for making the decision d» and 
d, respectively, and ty = ts(k, v, y’) is a significant or critical t ratio for which 
a set of values are given in Table 1. The arguments determining the significant 
value ty , are the ratio k = k,/ko from the loss function, the degrees of freedom v 
for ¢ and the variance y’ of the prior density for r. 

The rule ¢4(¢) may be derived as follows. For the average risk of any rule 
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TABLE 1 


Minimum-Average-Risk Significant t Values (tx Values) 


Log k 
0.0 a ’ 1.5 2.0 


1.353 
1.379 
1.367 
1.356 
1.340 
1 


326 


2.503 
.926 
.718 
. 653 
. 582 
. 531 


no > 


m bo bo 


3. 741 
2.648 
2.303 

. 030 

875 


9.243 


= 


3.744 
2.693 
2.296 


(t), we have 


A(é, ) [ [Lo(r)(1 — o(t)) + Lilr)o(t)]f(t| r) dt &(r) dr 
(2.6) Tt iz 


constant + o(t)hi(t) dt, 
where f(t| 7) is the non-central ¢ density function, (2.12) below, and 


(2.7) h(t) = [ [Li(r) — Lo(r) | f(t|r)é(r) dr. 


The minimum average risk rule may thus be written 


_ f0, h(t) > 0, 
om oe(t) = 41) hy(t) < 0. 
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From the inequality h(t) < 0 we get 
0 3 
(2.9) i] ki |r| f(t| r)&(r) dr < | ko rf(t|r)E(r) dr, 


and hence 


ae 


I h3(r, t) dr ky 


(2.10) h(t) = = &, 


0 > : 
— | h3(7, t) dr 


where 
(2.11) h3(r, t) = rf(t| r)&(r). 


Now, introducing the non-central ¢ density in the form 


2 


es k(ut—r)2 
(2.12) f(t|r) = [ , (Qn)! up(ul\v) du, -— 2S <t < @, 
/0 a7 }* 


where y(u |v) is the x’-related density function of u = x(v)/v, that is 
si v(ulv) = (vu teh) /[(4v — 1) 12"74, u> 0, 
(2.13) 
= 0, otherwise, 


and discarding constants in h3(7, ¢) which will cancel and not affect the value of 
he(t), we have 


@ 
(2.14) h3(7,t) « + [ exp {43|( w—r) +e ¥ |}up(u v) du. 
Jo 


Putting 6’ = 1 + 1/y° this becomes 


}r2g2 2 2 v 
h3(7,t) « re ’ I exp{utr — 4u(v + ¢)}u’ du 
0 


17292 lt ° . 
17*8 v+i 2 2 
Te ; 7 [ u exp{—3u(v + €)} du 


i=0 t: 0 


mae tet 5 Ctr)’ 2 ieti— ty, 
i—p 2! v+F 2 


Next, putting hy(t) = ft A(7, t)d rand y = t/B(v + t?)' and integrating term 
by term, we get 


h,(t) a ‘(tS Yi f (rB)**e Mr)? (+8) 
; ‘ o 0 
v+t+i—1\,/z\, 
- No)! 
a v+i—-—1\, 7+1\, 
‘ . -— 9 . 9 . 
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But 2°{(¢ + 1)/2]!(i/2)! = (¢ + 1)!4(x)', from which 


1 — ,fv+i-1 ) 1 
h(t) = % lydy (' 5 /& ):] 


1 
= is lyhs(y) + yhe(y)], 


where h;(y) is the sum of even terms, 


hy) = Dy (p + i — 4) (i + 4)! 


1=(0 


with p = v/2, and he(y) is the sum of odd terms, 
h(y) = Dy (p + 1)V (E+ 1). 
1=0 


Working first with hs(y), it may be written as 


(2.18) hs(y) = (p — 4) !F(p + 3, 158; y°)/(4)}, 
where F(a, b; c; x) denotes the hypergeometric function 


abx 4 aa +1)b(b+1)2 


= c ~ e(e + 1) 2 


Applying Euler’s transformation 

. e~e-b 5 

F(a,b;c;7) = (1 — x) * “F(c — a,e — b;3 ¢; 2) 
and reducing, we get 


y 


(2.19) hs(y) = —- (1 —7) a (1 — w’)”* du(p — 4)!/(4)!. 
y 


0 
Hence 


y 


ai- 2 a | 


-(p— 43)!/(3)1. 


[yrs(y)] = (-y) ?™ g =)" + apy | 


¢ 
(2.20) dy 


Next, hg(y) sums to (p — 1)![(1 — y*)-? — 1]/y so that 

(2.21) (d/dy){yhe(y)] = (1 — y?)~?*?2pty. 

Treating the denominator of h2(¢) in the same way and combining results we get 
(2.22) ho(t) = ha(t)/ha( —t) = g(y)/g(—y), 


where 


(223) g(y)=(1—y)"+ wy | (1 — av)? du + = 


0 
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From (2.8), (2.10) and (2.22) we have 


0, g(y)/g(—-y) <k, 
) 9. = ; 7 7 
(2.24) dx(t) \1, g(y)/g(-y) > k. 


Hence, since g(y)/g(—y) is monotone increasing with respect to y, 


yY < Yx, 
Y > Yx, 


x(t) = : 


where yx = yx(k, v) is the solution for y in g(y)/g(—y) = k. Finally, putting 
tx = tx(k, v, y’) for the solution for ¢ in ys = t/8(v + t’)* we have 


t< ts, 


fo 
(2.26) (t) = : 
‘ Gall 14, cae 


as was to be shown. More specific details of the computation of the significant t 
ratios in Table 1 are given in Section 6. 

EXAMPLE 1. To illustrate suppose the following: A standard treatment is 
modified in the hopes of producing an increased yield. An experiment is run 
giving r yield observations x, , «++ , Zi, and 2 , --* , 22, for the new and control 
(standard) treatment respectively. It can be assumed that the respective sets of 
data are random independent samples from normal populations with means yy 
and yw, and with the same, but unknown, variance o’. It is required to decide 
whether the new treatment is significantly superior (in yield) or not significantly 
superior than the standard; whether to generally recommend it as the superior 
or to withhold such a general recommendation. Type-1-like errors of reecommend- 
ing a non-superior new treatment (making d, when 6 S 0, where 6 = yw; — ue) 
are thought to increase in seriousness in direct proportion to the degree, —4, of 
inferiority involved. Type-2-like errors of failing to recommend a superior new 
treatment (making dy when 6 > 0) are similarly thought to increase in seriousness 
in direct proportion to the degree, 6, of superiority involved. For any absolute 
difference 59 = |6| , recommendation of an inferior new treatment with 6 = —d, 
is considered k times as serious as the corresponding failure to recommend a 
superior new treatment with 6 = 6). In the averaging of risks it is desired to 
weight risks symmetrically at 6 = +4 for all possible differences 5) = 0, with 


weights decreasing with respect to 5) as given by a normal density for r = 
5/(20°/r)* with mean zero and variance 7. A minimum-average-risk rule is re- 
quired which would be invariant with respect to any changes of scale or location 
in the observations. 

Because of sufficiency and invariance considerations the required rule can be 
restricted to depend on the observations through only the ¢ ratio 


t= (z, — &) (2s*/r)*, 


(2.27) 
= a ° 2. ° ° 

where Z, and #, are the respective sample means and s’ is the pooled within-sample 

variance estimate 


(2.28) G= > Dd (x; — %)*/2(r — 1). 


i=l j=] 
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The required rule is then given by (2.26) where tx te(k, v, 7’) with 
vy = 2(r — 1). 

In practice a similar problem could well arise in which it is desired to use a 
non-zero mean y; for the prior density. With infinite error degrees of freedom 
(v = «) the significant ¢ ratio for such an asymmetric problem can be shown 
to be given by subtracting a correction of u;/(20’/r)* from the corresponding 
value in Table 1. For v finite the derivation is more difficult. The use of a similar 
correction of ys/(2s’/r)* would no doubt suffice however for practical purposes 
except for very small values of v. Since the extensions of this problem in the later 
sections concern only the symmetric case, a more detailed treatment of the 
asymmetric case will not be taken up here. 

From the roles they play the parameters k, v, and y’ determining the minimum- 
average-risk significant ¢ values may be usefully termed the loss or error serious- 
ness ratio, the error degrees of freedom and the risk-weighting variance ratio re- 
spectively. Before going on it is of interest to note in Table 1 that a loss ratio of 
100 (log k = 2) infinite error degrees of freedom (v = ~ ), and a risk weighting 
variance ratio of 3 (y° = 3) givea ts of 1.987 close to that 1.960 of a .025 level 
test of Ho:r S 0. 


3. A related three-decision Student-t problem. Given a similar observed ¢ 
value, a problem related to that of Section 2 is one of choosing between the three 
common decisions 


(3.1) do:decide |r| < A, d;:decider7 > A and d,:decide r < —A, 


where, as before, A is some unspecified positive boundary value. In the language 
of the experimenter dp , d,; and d, are the decisions that 7 is not significantly dif- 
ferent from zero, that 7 is significantly greater than zero and that 7 is signifi- 
cantly less than zero, respectively. 

Our second result is a Bayes rule for this problem with respect to a similar 
linear loss function 


0, 


Co |r| , 


~ 


Le (Cr) = L (12, do) { 


pA r) idee Lie, d;) ail ‘¢ \7| ’ 


~ ~ 


—— a — 


IVA VIA & 


2) (7) a L™(¢, dz) . T 


CiT, T 


where co and ¢; are positive constants such that c, — co > co and with respect to 
the same normal prior density (2.4) for 7. The rule is 


(100), lt} < be. 


(3.3) x(t) = (o%(t) dis(t) d24(t)) = li 10), it, 
\(001), t< —ts, 


where the significant ¢ ratio t, = tx(k, v, y’) is the same as that of the previous 
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section with the loss ratio now given by k = (c,/co) — 1, and where ¢;''(t) = 0 
or 1 denotes the not making or making of the decision d; , zi = 0, 1, 2. 
This result can be obtained as follows: First the three-decision subset system 


(3.4) wo:|7r} S A, mir > &, weit < —A, 


can be expressed as the restricted product (the full product less empty intersec- 
tions, see Lehmann [7]) of two component two-decision subset systems like that 
of the previous problem in Section 2, namely 

Component system for +7: a: 7S A, wi: t>A, 


(3.5) 
Component system for —7r: wo: — A Wi—t>A. 


Thus 
(3.6) wo = a9 NM wo, on = 012 w1, 2 = wo Naw. 


The intersection w; M w; is excluded since it is empty. Put in other words, each 
of the main decisions is equivalent to two joint component decisions 


(3.7) do to dy with dj, d, to dj with dq) and d, to dj with d7; 


the joint decision dj with d; is excluded since it has mutually incompatible com- 
ponents; d; is the decision tr €w;;a = +, —;7 = 0,1. 

Second, by putting k; = c; — co and ko = co the losses for the main decisions 
can be expressed as the sums of losses for its component decisions as given by 
the two-decision loss function (2.3) in the previous section. Demonstrating this 


(O+k(—r) =colr], 1+r<O0 
Lol T) + Lol —f) =m § 0 + 0 = 0, tT = O> 
kor + 0 = colrl, 0} 


/ 
hy tT| + ko(—r) = G1 \7l, 60) 
(3.8) Ly( T) + Lol —¢) % ky T + 8) = (, |Ti; = 0 P 
(0+ 0 = 0, r>O 


0+0=0, 4 
Lo(7r) + L,(—r) = (0+ fk, -—F. > tT = 0 > 
kor + ky |—7r] =e t> 0) 


Next, any rule o”’(t) for the three-decision problem can also be expressed in 
: I I 


terms of two component two-decision rules. For this purpose it is convenient to 
first re-express the two-decision function ¢(¢) in the two-element vector form 
(3.9) b(t) = (dot) gilt) ), 

where ¢o(t) = 1 — o(t) and ¢(t) = ¢(t). In this form, for example, the Bayes 


rule y(t) of the previous section appears as 


((01), (<4. 


(3.10) 6.(t) = 
ve (10), ist. 
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With this vector notation (3.9) for the two-decision function and the components 
(3.7) of the three decisions in mind we can write 
o” (t) = (5 (t) o; (t) o: (t)) 
= (go (t) do (t) $3 (t)do (t) oo (t)o1 (t)), 


(3.11) 


where ¢;(¢) = 0 or 1 denotes the not making or making of decision d7; a = 
+,—;2=0,1. 
Now, working with the average risk of 6*’(t) we have 


PL oa 


A(so®) = | | 


2 — 


Dd LO (r) 92 (f(t 7) dt E(r) dr 


a—x 2 1 1 
= [ > > (Li(r) + Lj(— 7) )ob1 (t)o; (t)f(t| r) dt Er) dt, 


provided that the condition 
(3.13) ¢; (t)o; (t) = 0, —wo <t< ~w, 


is satisfiied. (Following Lehmann [7] this may be termed a compatibility condi- 
tion since, if it were not satisfied, the component rules would give incompatible 
decisions). Assuming pro tempore that this is satisfied, the average risk readily 


reduces to 
(3.14) A(t, &”’) = A(E, o'(t)) + A(E, o (8)), 


where the component average risks A(é, @*(t)), a = +, —, are the same func- 
tions of tand —tas was A(t,@) = A(&, ¢(t)) of t in Section 2. To minimize 
(3.14) it is sufficient to minimize the component average risks separately subject 
still, of course, to the compatibility condition (3.13). From the result of the pre- 
vious section, as expressed in (3.10), the component solutions are 


/ 


ine ' (1,0), t<te (10), —t<ts 

RE = < € (¢) = ° 

(3.28) GAD 4 Gags pg OP ONO O Fegh, cess 

Since ¢;«(t) = 1 only int > ty , dox(t) = 1 only in —t > ty = t < —ty, and 
since ty is positive (from k, > ko) we do have ¢i«(t)¢i«(t) = 0 for all t, that is, 
the solutions are compatible. Thus the required Bayes rule is given by 


o% (t) = (dox(t) dow (t dix (t)dox(t) dow (tbr (t) ) 


(100), (t < ty)(t > —ty) = |t] < te, 
(3.16) - < (010), (t{> te)(t > —te) = t > ts, 
(001), Co i< tx) (t K —t,) =t< —t,, 


as was to be shown. 

EXAMPLE 2. Suppose that two samples of yields have been observed as in 
Example | except that now they are for two new treatments. It is required to 
decide whether the first can be recommended as the superior (in yield), whether 
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the second can, or whether to withhold recommendations on both. Errors of 
making a wrong recommendation or of failing to make an appropriate recom- 
mendation are again scorable in direct proportion to the degree of inferiority or 
superiority involved respectively. For any difference that may exist the error of 
a wrong recommendation is considered ¢ times as serious as that of just failing 
to make the right recommendation. The requirements for weighting of risks and 
invariance are the same. The required rule is then given by (3.16) where ¢ is as 
obtained (2.27) in Example 1, ty = ts(k,v, y’) with k = ¢ — landv = 2(r — 1). 
The subtraction of one from the loss ratio c will be trivial and unnecessary in 
most practical situations with ¢ not small. The need for it here and not in Exam- 
ple 1, it may be said, comes from the fact that a wrong recommendation now in- 
cludes implicitly a failure to make an appropriate recommendation as well. 


4. A symmetric multiple comparisons problem. Given N = jn(n — 1) 
t statistics of the form 


(4.1) tog = (fp — fe) /8p, pg eN, 

with non-centrality parameters of the form 

(4.2) Too = (tp — ob, page N, 

where N is used for convenience to denote the set of pairs 
{1,2;1,3;---;(n—1),7n!} 


as well as its size, a common multiple comparisons problem is that of choosing 
between the three decisions 


¢ io 0 Rh 2 1 2. ’ 2 
(4.3) Gee? Tee Une 5 Gout tua f tines Gae2tee f Goes 


simultaneously for all pg ¢ N, where the subsets are of the previous (3.1) form 
aes cs 2. 
(4.4) Wae> \Taxe| & A, Weettee > A, Wee: Taq < A. 


The joint density of the ¢,,’s is the one that would result, for example, from 
the common assumptions (a) fi , --~ , 4» are normal independent variables with 
means 4, -*: , #, and the same but unknown variance oj and (b) sg is an esti- 
mator of og with v degrees of freedom such that u = s3/o, has the x’-related 
density (2.13) independently of f4,,---,4,., from which of,., = 2'o, and 
Si,-f, = 2's, . If we put y for any vector of (n — 1) orthogonal normalized com- 
parisons among the estimates 4; , that is, y = Ati where AA’ = I/,_,, Aj = 0 
(j being a vector of ones), and @ = (fi; --- fn)’, then the density of the ¢,,’s can 
be represented conveniently in terms of that of the (n — 1)-element ¢ vector 


(4.5) t = y 82 


depending on the corresponding (n — 1)-element non-centrality parameter 
vector 


(4.6) 
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where n = Ayand yw = (u --- un)’. This can readily be expressed as 


o _ ads a , in ) 


—o <t<©@ 
° 2 e ‘ 
where ¥(u|v) is the x°-related density (2.13). 
Our main result is a Bayes rule for this problem with respect to generalizations 
of the linear additive loss functions and normal prior density used in the previous 
sections. 


More explicitly, the subsets wo, w:,--- ,wy—1 of the multiple comparisons 
problem are the non-empty intersections 


(4.8) w= 1) wt, jn =0,lor2, i=0,---,M-—1, 


pqeNn 


of the subsets of all the component three-decision problems involved. The deci- 
sion system consists of the M corresponding decisions 


(4.9) d;2% € w;, ¢=0,---,M-1, 
For example, thinking of the component subset systems (4.4) in the form 
(4.10) woe: (lp — Hel SA’), weg: (te < tp — A’), ng: (tp < bg — A’). 
(where A’ = A2'oz) and using the corresponding more graphic notation 

(4.11) (p,¢), (¢, P), (p, @); 


° ¢ 1 2 ° ° . 
in place of wyg, pq, pq respectively, the M = 19 multiple comparisons subsets 


(2,1) (1.2) 
(2,3) = (1,2, w, = (2,3, 1) = (1, 3,2) 
(1,3) (3, 2) 3, 1,2 = (1,3, 2) 
(2,3) = ‘ ‘ 
(2,3) ‘ 
(3,1) (3,2) = (3,1,2 ‘ = (3, 1,2) 
(2,3) . . 
(2,3) ,; = (1,2,: ; = (1, 2,3) 
(1,3) (3,2 . . (1, 3, 2) 
(2,3) 2, ¢ wi7 = (2, 1,3) = (1,2,3) 


in the case n = 3 may be developed as in (4.12). In general, following Duncan 
[3], the notation (7,J,, ---) may be used to denote subsets in which the corre- 
sponding means y;, uw; , we, °°: are ranked in significantly ascending order from 
left to right (i.e. with differences |6| > A’) except that subscripts underscored by 
a common line denote pairs of means for which the difference is not significant 
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(i.e. |6] S A’). Thus w = (1,3,2) is the subset in which y; is significantly less 
than ye but w: and ye each do not differ significantly from yu; . The remaining 
2° — 19 = 8 intersections, e.g., (2, 1)(1,3)(3, 2), of the component subsets are 
not included in the multiple compariscns system since they are empty. 

Choosing the elements of + as 7 = (4, — pe) Qo, for abscissa and rz = 
(ui + we — Qu;)/6'os for ordinate the parameter subsets, in the case n = 3, may 
be represented as shown in Figure 2 of [3], where the vertical lines are rz, = +A, 
the lines from top left to bottom right are 7;; = +A and those from bottom left 
to top right are 72; = +A. 

Referring back for a moment, the need for the definition (2.1) instead of (2.2) 
for the decisions of our initial subcomponent problem can now be seen more 
clearly in the formation of the subsets (4.12). If (2.2) were used the six subsets 
Ww, , We , &3 , Ws , W7 ANd w; of the form (?,7, *) would be eliminated. These however 
are useful members of the system, hence the need for some such definition as 
(2.1) to retain them. 

The size M of the multiple comparisons subset system increases rapidly with n 
the number of means involved. In the next case n = 4, for example, the numbers 
of the various forms of subsets, using the same notation, are 


Gh & F Rees ae ee a ee Re aes ee 
+++ 12, (4,9, k, 1) --- 12, (4,9, kL) --- 12, (4,7, k, 1) «++ 24, 
(1,J,k,l) --- 12, (0,9, kL) --- 12, (4,9, k,1) --+ 12, (4,9, k, 1) 
- 24. (4,7,k, 1) --- 6, (4,9, Kl) --- 24, (4,9, 0) --- 24. 
making M = 183 in all. 
The losses are defined as the sum of the losses (3.2) for each of the component 
decisions involved; that is 
LY (2) = L™ (2, di) = Do L$? (19), 
(4.14) ar 
Jog = 0, 1 or 2;4 = O,---, M—1. 


Thus, suppose for example in the case n = 3, wu 2‘ = (uuous)’/2'o represents 


the expected standardized yields of three manurial treatments on a particular 
° rr 3) ‘ 4 ‘ > : 

agricultural crop. Then the loss L}{’(<) at p/2’e = (10, 12, 8)’ incurred by the 

decision d\4:2¢ (1, 2, 3) is 


(3) 2 (2 2) 
Lis (2) = Le’ (ti) + Le’ (713) + Lh ( T23) 
(4.15) = Ly’(—2) + LyY’(2) + Ly’ (4) 
= 0 ao 2c, — 4¢p = 2k, - 6ko ° 
The third contribution 4c9 = 4ko enters, it may be said, because the decision dy, 
has failed to recognize the 4-unit superiority of treatment two over treatment 
three. The second contribution 2c, = 2k; + 2k» enters, because, in similar terms, 


di, not only fails to recognize the 2-unit superiority of treatment one over treat- 
ment three, incurring a loss of 2ko , it also commits the more serious error of 
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ranking treatment three above treatment one, incurring an additional loss of 
2k, . No loss is incurred by the first component decision because dy ranks treat- 
ment two correctly above treatment one. 

The prior density for averaging risks over all points in the (n — 1)-dimensional 
space for + is the simple normal density function 


(4.16) t,.(e|y*) = (297) o-t) ea —-o <t< ow. 


This is one which would result, for example, from assuming that the means 
1, °** , Mn have inde pe nde nt normal prior distributions with the same mean 6 


and same variance 0; = 70; - 
From the additive losses assumption it follows as before that the average risk 
n) 


° > * ( 
for any decision rule 6°” (t) = (9$”(t) --- @¥2,(t)) may be expressed as the 
sum of average risks for component thee asil ‘ision rules 


o'"(t) = (p07(t) o?7(t) o2%(t)) 


provided again that the component rules are compatible. The steps may be 
written 


o mM—l 
Alin,” ) -[-f dS LY? (2)o$ (t)fn(t | e)dt En(e) de 


oO i=( 


-f. ft Does De LS (re) + +++ + Ly (rene) 


CO j1=0 jn=0 


-pj1(t) == $e" (t) falt |e) dtén(*) dr 


oF [ D> LE (rq)? 9(t) falt |e) dt En(e) de 
paqeNn 


eo j=0 


is ae A (En o”"(t)). 


pqen 


The compatibility condition may be written as 


(4.18) Il ope (t) = 0, jng = 0,1 or2,—@ <t < o, 


pqeNn 


for all products leading to incompatible decisions not included in 
,M-1} 


and is required in proceeding from the first to the second line of (4.17) 
It then follows as before that the Bayes rule 


(4.19) oe (t) = (dow (t) «++ beir-14(t)) 
for the multiple comparisons problem is formed by the products 


(4.20) oi: (t) = TI ¢):,«(t), Joe = 0, 1 or 2, 


pqeNn 
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of the elements of the Bayes rules $%*(t) minimizing 


(4.21) A(En, ”"(t)), pqgeN, 
provided these are compatible. 


The first step in deriving $%“(t) is practically identical with that ((3.12) to 
(3.14)) in Section 3. Thus 


(4.22) h(t) = (p28*(t) 2s (t) ors* (t)ast (t) pret (tore (t)) 


where, dropping the superscripts pg+ or pq—, x(t) = (do«(t) gix(t)) 
now minimizes a subcomponent average risk of the form 


(4.23) A(é, o(t)) = r f- > L(t q) Gilt) fn(t| 2) dt &,(2) de. 

ce i=0 
The elements of « may be chosen so that 7, is the first element 1, of «. The com- 
patibility condition 


(4.24) bis (tore (t) = 0 


must again be met. 

The work of minimizing (4.23) follows closely that of minimizing (2.6) in Sec- 
tion 2 except that now the sample and parameter spaces have additional dimen- 
sions, (n — 2) each. Dropping the subscript pq from 7», the steps follow through 
with obvious changes till we get to 


hs” (r,t, v) 


f on — 9 9 n— 1 
« rf [ exp \—5 7 [ (ut; —ri) +7; y|> u""y(ulv) du dr 


\ 


where the first integration is with respect to t: defined as the last n — 2 elements 
of « = (t 3)’. This appea rsin place of hs(7, ¢) as in (2.14) before which may now 
be denoted by hf (7, t, v). On integrating with respect to t. we get 


"y ° f 1 oe 2 ie 
hs (7,t,v) « | - ~~ s | (w — ey Se TTT 
0 a 


n—l \ 
2,2 // 2 n—1 v—l hvu2 
-Eveas yn ue” du 
i==2 


} 


rf exp{—iH(wt — 7) + 7/y)ww oe 
\ p 2 


where ¢ = t, is the first element of ¢ and is thus t = ¢,,, 
n—1l 


(427) w=u/R, t=R, R= v/ |e +>) #/a+ *)|. 


t=2 
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and vp’ = v+n — 2. Thus 


exp {—43[(ut’ — 7)? + 7/y']}up(ulv’) du 


(4.28) 
= he (7, F: py’ ). 

Making a direct application of the remaining derivation in Section 2 it now 

follows that 


(1,0), (t' < tx) = (t < ts/R) 
(4.29) (s) ea os . 
os \ (0, 1), (t! > tx) = (t > ts/R) 
where tx = tx(k, v’, y’) is the same significant ¢ value as before except that its 
degrees of freedom are now v’ = v + n — 2. 
Since tx/R > 0, the compatibility condition (4.24) is met and applying (4.22) 
the component average risk (4.21) is minimized by 


((100), (te < te) = (tpg < ts/R), 
(4.30) o2"(t) = +(010), (tog > te) = (tog > te/R), 
\(001), (the < —tx) = (type < —ts/R), 


where t,, = Rt», for all pg ¢ N. Again since ty/R > 0, the preceding compatibility 
conditions (4.18) are met and applying (4.19) the Bayes rule for the multiple 
comparisons problem is given by the simultaneous application (4.20) of all 
N = n(n — 1)/2 of the three-decision Bayes rules (4.30). 

EXAMPLE 3. Suppose that samples of yields like those of Example 2 have 
been obtained for n new treatments. For each and every pair (a, b) of treatments 
it is required to decide whether a can be recommended as the superior, whether 
b can or whether to withhold recommendations on both. Losses are scorable with 
respect to each pair of treatments as in Example 2, the loss ratio c being the same 
for all pairs, and are additive in giving the losses for each of the joint decisions 
to which they contribute. Risks are to be averaged with respect to a normal inde- 
pendent prior density for each of the means 4; , --- , u» each with the same mean 
and same variance yo /r. An invariant rule is required as before. 

Because of sufficiency and invariance considerations the required rule can be 
restricted to depend on the observations through only the ¢ vector 


(4.31) t = Ax/(s*/r)' 


where A is as defined before for (4.5), X = (%, +--+ , ,)’ is the vector of sample 
means and s° is the pooled within-sample variance estimate 


(4.32) s = > Dd (xij — &;)*/n(r - 1) 
i=] j=l 
with v = n(r — 1) degrees of freedom. 


The required decision rule is then given by the simultaneous application 
(4.20) of (4.30) where ty = ts(k, v’', y’) with k = c¢ — landv’ =v+n—2 = 
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n(r — 1) +  — 2 and where ¢,, can be obtained by analysis-of-variance type 
steps as follow: Put S,, S,,, Sp. and S, for the treatment sum of squares, the 
sum of squares for the pq difference, the residual sum of squares for the pq dif- 
ference and the error sum of squares 


Ss; = > a4 
1 


/ ' ’ ’ 
S.=S:-Sy», S.= ( 


=i j 
. "a . n r 2 72 

respectively, where C is the correction term (>>7_; >.j- 2;;)*/nr. Let s/7, denote 
the pooled estimate of o obtained as 

12 ro y/ 2 J 
89g = [S. + S,,/(1 + 7°) ]/0’. 
Tn , . . . . 
Chen ¢,, may be obtained as the square root of the variance ratio 


(4.35) 


9 


, e ° ° “ 
tng is given the same sign as Z, 


A more convenient rule for application can be obtained by expressing the in- 


&. 
~P % 4» ~ « . 2 a . + tg , 
equalities tp, S ts in the form d,, 2 ds where dx is a least significant value for 
iff . * . 12 2 
the difference d,, = %, — %,. From t,, = t we get 
9 19 


* a/S8pq = v' Syq [S. + (S, — Spq)/(1 + 7 )I 


(4.36) &[S. + (S: — S,.)/(1 + 7’)] = v'S 


Pq 


Spalv’ + /(1 + 7°)] = GIS. + S/(1 + 7): 


> > 


But S,, = 3rd,,, hence this gives d,, = ds where 


‘pa 
i : : . \3 
dy =4~ ts[S. + S./(1 + ¥)]/[e’) + G/U + ¥))>? 


! / 


From this and a check on signs it follows that the multiple comparisons Bayes 
rule is given by the simultaneous application of the rules 


((100), dp.| < de, 
(4.38) o4'(t) = 4(010), ie > Ge, 


(001), dng < —dx. 


Pq 


5. Discussion. 
~ at — ss > , 
5.1. On the additional error degrees of freedom. The emergence of tp, = 


dyq/(2rs’’)’ with v’ = v + n — 2 degrees of freedom as the component test sta- 


tistic in the multiple comparisons solution may be surprising at first but less so 


Pq 


after due consideration. In giving u; , --- , u, identical independent normal dis- 
tributions with variance y’o’/r for risk-weighting purposes, the residual sum of 
squares between treatments, S,, = r>,i2 yi = rs )_ 22 & in Example 3 for 
instance, is given the distribution of (1 + y’)o’x>,_2 . On this basis 


(5.1) Spa = [S, ae Sine (1 + y’)] y’ 
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becomes the appropriate estimator for o” in place of s’ = S,/v and the result is 
as might be expected. In practice a user might sometimes be reluctant to depend 
on the prior distribution assumption to this extent however, and might even want, 
see Subsection 5.3, to use S,,, or better S,, to decide on an appropriate value 
4° for y’. In such cases it would seem good sense to use a modified rule based on 
ty, instead of t,, . This would consist of simultaneous applications of 


((100), dpqg < de, 
o3°(t) = 4 (010), dy > dz, 
(001), due < —ds, 


where dx is the least significant difference 
(3.3) dy = (285 /r)'te. 


with t. = tx(k, v, 9°) based on v degrees of freedom. 

5.2. On the independence of the least significant difference and n. By far the most 
striking feature of the multiple comparisons Bayes rule is the practically com- 
plete lack of dependence of the least significant difference ds on n, the number 
of means involved. (The dependence of dy on n via the estimation of error as 
discussed in the previous subsection is relatively trivial in this context and de- 
creases to zero as v increases to ~.) This is a direct consequence of the additive 
losses assumption similar results of which have also been treated by Duncan 
[3] and Thompson [15] and, in a more general form and context, by Lehmann [7]. 
In the past, a rule of this type, with ds not increasing with n, has been con- 
sidered more or less unacceptable. The main basis of objection has been the 
rapid increase in its so-called n-treatment significance level (Duncan [1] and 
[3]) or its experimentwise error rate (Tukey [17] and [18]), 


(5.4) a, = P{rejecting H, | H,|, H, = Hiatw 


with respect to n. 
To illustrate, a non-increasing least significant difference of 


dx = 1.960 v/ 20% = 2.77o; 


in the case v = © gives the experiment-wise error rates (found as upper-tail 
probabilities P[q, > 2.77] of the range q,) 


(5.5) a2 = .0500, a; = .1223, a, = .2034,---, ao = .9183. 


The possibility in this case of wrongly rejecting the homogeneity hypothesis for 
20 means, for example, with a probability of 91.83 per cent, may at first appear 
to be unacceptably high. As a result, procedures have been proposed with in- 
creasing significant differences aimed at suppressing the rapid increases in a, . 
These have varied considerably from rapidly increasing significant differences 
such as (in comparable cases, dropping the factor o;) 

(5.6) 2.44, 3.02, 0.03, °°* , OO, 

to more slowly increasing ones such as 


(5.7) 2.77, 2.92, 3.02, --- , 3.47, 
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depending on the relative importance attached to experimentwise error rates 
and the degree to which they should be suppressed. The differences (5.6) termed 
honest significant differences by Tukey [17] [18] suppress all of the experiment- 
wise error rates to .0500. The differences (5.7) proposed by the author [1] and 
[3] suppress them to less conservative so-called levels based on degrees of free- 
dom 

(5.8) a2 = 0500, a3 = 0975, as = .1426,---, a = .6415, 


obtained as a, = 1 — (1 — a)". A less conservative procedure yet is the 
one due to Fisher [5] that uses the same least significant difference for all n 
(2.77 in the case above) provided that first the homogeneity hypothesis H, can 
be rejected by an F ratio test. 

Now it appears that, in a Bayes sense, provided the losses are additive and 
other things (e.g., k and y’) are equal, the same least significant difference is 
optimum no matter how large the number n of treatments involved. (From 
Lehmann’s work [7] it is clear that this would also apply under other optimality 
criteria such as, for example, minimax.) The high experimentwise error-rate of 
91.83 per cent in the case quoted might well be worth tolerating for instance, 
because, it might be said, of the relatively low prior probability of the hypothesis 
Ho involved and its relative unimportance among so many others. 

The inverse form of dependence of the least significant difference dx on the 
risk-weighting variance ratio y’ may do much to reconcile its independence of n 
with at least some of the common almost instinctive urge to make it increase 
with n. In the case v = © it is directly proportional to (1 + 1/y7’)' and approxi- 
mately so for smaller values of v. If, in the conduct of a large experiment, the 
treatments under study have a lower anticipated heterogeneity than those which 
would have been studied in a more limited experiment with fewer populations, 
a lower risk-weighting variance ratio y’ would be appropriate and hence would 
be a larger least significant difference. Such a situation could often arise in 
practice, and, if y’ is varying in an interval of small values this could make a 
substantial increase in the significant difference with an increase in n. On the 
other hand, however, the reverse situation could also arise. In selection experi- 
ments, for example, the treatments under consideration may be the top n per- 
formers as assessed by experiences in previous trials. Here, the larger the num- 
ber of treatments it has been possible to include in an experiment, the larger 
will be the appropriate y’ and hence, the smaller the least significant difference. 

5.3. A practical adaptation of the Bayes rule. In the complete absence of prior 
criteria for choosing y’, the user might sensibly, it would seem, obtain an esti- 
mate of it from the variance ratio 


(5.9) F = 


n—1 2 Y 
.. wc a , 
7. vs (- —_—— — in Example 3), 


1 
n— 1 ini 8 n—-1: 
employed in the preliminary F test of Fisher’s least-significant-difference pro- 
cedure. Since the ratio of the corresponding expectations is 1 + 7’ he might for 
example put f° = F — 1, enter Table 1 with 7 = 4’, and use the simple direct 
rule as given (5.2) in Subsection 5.1. It is of considerable interest to note the 
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closeness of Fisher’s earlier procedure to this type of adaptation. Both rules de- 
pend on a preliminary inspection of the F ratio. In Fisher’s rule a big or small 
F ratio leads to the use of an independently chosen least significant difference on 
a go-no basis. In the new rule a big or small F ratio leads to the use of a small 
or big least significant difference on a continuously related basis. 

5.4 Concluding remarks. As presented the model is limited to a class of sym- 
metric problems in which the loss and prior probabilities are invariant with 
respect to all n! permutations of the means involved. Many problems in practice 
however are naturally taken to be symmetric in this way. Within this class, the 
assumptions of linear losses and normal prior densities would seem directly ap- 
propriate for some problems and useful at least as good approximations for 
many others. 

From the given development it is fairly clear that similar rules for a wider 
class of less symmetrical problems can be obtained leading to the use of differ- 
ent significant ¢ ratios for each of the component or even each of the subcom- 
ponent problems involved. Development and discussion of these are deferred to 
a further paper. 

A most interesting point is one raised (in private correspondence) by Pro- 
fessor F. J. Anscombe following from the type of discussion in Subsection 5.3. 
In addition to providing an estimate of the prior variance 7’ and therefore a 
means of rejecting an assumed value for this parameter, the data may provide 
evidence for reasonably rejecting other assumed features of the prior density. 
Further developments are needed for handling problems of this type. 

In conclusion, it is worth repeating, the most important result discussed in 
Subsection 5.2 namely the independence of the least-significant-difference ds and 
the number of components problems involved, depends only on the additivity 
assumption for the losses. It is independent of the form of the component loss 
functions and of prior density assumed. It appears further that the same result 
would follow even if the class of component problems were extended to include 
all contrasts among the means as considered by Scheffé [11]. Thus in a symmetric 
situation, for example, the same significant ¢ ratio would be appropriate whether 
it be desired to test just one comparison chosen a priori, the set of all n(n — 1) 
comparisons in the multiple comparisons problem or the set of all contrasts. 
The additivity-of-losses assumption on which this critically depends appears to 
be a reasonable one, and appropriate to many practical situations. 


6. Computation of significant ¢ ratios. In computing the values in Table 1 the 
ratios g(y)/g(—y) in (2.22) may be simplified first to g:(y)/g:(—y) where 
(a — y)'+ ysin” y + wy/2, 
(1+ y) 
2(1 — y7)' + 3y(1 — y’)? + 3y sin’ y + 3ry/2, 
(6.1) gly) =4(11 4+ y)°/(34+y), 
(1+ y)‘/(5+4y+y), 
(1 + y)*/(429 + 1384y + 2063y? + 1776y’ 
| + 915y* + 264y° + 32y°), v = 14, 
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or to g2(z)/ge(—z) where 
(6.2) go(z) = f(z) + 2F(z), 


and f and F are the standard normal density and cumulative distribution func- 
tions respectively. These follow readily from (2.23) except for (6.2) in the case 
v = »# which can be obtained as follows. 

Putting y = z/(v+2)* and thus (1 — 7?) = 1/(1+2/v) and dy = 
dz/w(1 + 2 v)*} in g(y) in (2.23) we get 


Gir nice See ese { 5%. i 
=. (1+ 2/v)?? (1 + 2/v)' Jo (1 + w/v) err 
__(av)'[(v = 2)/2}t2 
[(v — 1)/2]!201 + 2/v)) 
Recalling that the probability density function of the Student ¢ distribution is 


(e-DAN  1a 
(mv)[(v — 2)/2}! (1 + B/) eR 


(6.3) 


(6.4) h(t|v) = 


we may write 


ID 
(yy« (Ait¢d2 v)*h(z | v) + ae aeeneae 
(6.5) sien ) (i + 2/v)! 


« (1+ 2/v)h(z\v) + zH(z|v) = gs(z) say, 


where H(z |v) is the cumulative distribution [*. A(t |v) dt. Treating g(—y) in 
the same way we reach the result 


(6.6) g(y)/g(—y) = gs(z)/gs(—z), 


which reduces to go(z)/g2(—z) asv > &. 
Next, y#(k,v) or zs(k) is found as the solution of g:(y)/gi(—y) = k or 
go(z)/go( —z) = k respectively. Finally t, is found as the positive square root of 


(6.7) t =v (B yx” —1) orfrom ts = 2/8. 
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THE USE OF LEAST FAVORABLE DISTRIBUTIONS IN 
TESTING COMPOSITE HYPOTHESES 


By H. E. RernHArptT 
Montana State University’ 


0. Introduction and summary. The usual method of finding a most powerful 
size a test of a composite hypothesis against a simple alternative is the guessing 
of a Least Favorable Distribution (LFD)—introduced at various levels of 
generality by Neyman and Pearson [6], Wald [7], Lehmann [4], and Lehmann 
and Stein [5]}—and testing the mixture of the distributions of the hypothesis over 
this LFD against the alternate using the Neyman-Pearson Fundamental Lemma. 
In guessing LFD’s statisticians have looked for a mixture which is “‘like’’ the 
alternate. 

In this paper, the notion of Uniformly Least Favorable Mixture (ULFM) is 
introduced. In Section 2, we show that a ULFM is a point in the convex set of 
mixtures of the hypothesis which is closest (in the sense of the £' norm) to the 
alternate. The condition is not sufficient. More generally, any LFM corresponds 
to a point which is closest to the alternate in some expansion or contraction of 
this set of mixtures. A sufficient condition for ULFM’s is, essentially, that the 
nuisance parameter can take on the same values in the alternate as in the hy- 
pothesis. In Section 3, we consider the case where no ULFM exists. We show, 
inter alia, that any distribution is least favorable for a closed set of a’s. (A 
pathological example shows that this closed set need not be the union of a finite 
number of closed intervals.) 


1. Notation and definitions. We consider a family fe , 6 ¢ 2, of densities and 
a density g with respect to a o-finite measure u over a measurable space (9%, %). 
For tests ¢, i.e., measurable functions¢ such that 0 S ¢(x) < 1 for all z, we shall 
use the inner product notation 


(1) a [ e@ nlx) ied. 


For the problem of testing a composite hypothesis H: fy , 6eQ, against the simple 
alternative g a most powerful level a test is a test ¢ which maximizes the power 
(¢, g) among all tests satisfying (¢, fe) S a forall 6€Q. 

We assume there is a o-algebra 8 on the indexing set 2 such that f(x) is 
measurable on & x %. If \ is a probability measure over (2, B) we define the 
mixture f, by 


(2) eta) [ folx) dr(0). 
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Then d is called a least favorable distribution and fy a least favorable mixture (LFM) 
at level a for testing H: fy , 9¢, against g if a most powerful level a test of fy 
against g is also most powerful for testing H against g. If f, is least favorable at 
every level a, 0 S a S 1,f is called a uniformly least favorable mixture (ULFM) 
for testing H against g. 

To obtain a most powerful test for testing H: fe , @ ¢ Q, against g and a least 
favorable distribution \ or mixture f, we shall frequently use the following 
generalization of the fundamental lemma of Neyman and Pearson by Lehmann 
and Stein ({5], Theorem 1, or Corollary 5, p. 92, of [3]): 

THEOREM 1.1. Suppose that d is a probability distribution over Q and that Q is a 
subset of 2 with X (Q’) = 1. Let ¢ be a test such that 


g(a) 1 if g(x) > kfi(x) 
g(x) = 0 if g(x) < kf\(z). 
Then ¢ is a most powerful level a test for testing H against g provided 
(3) (¢, fr) = supea(y, fe) = a for all @ € Q. 


2. Uniformly least favorable distributions. In [3] and [5] Lehmann and 
Lehmann and Stein, respectively, give examples of problems where the least 
favorable distribution depends on the level of significance. The case where the 
LFD is independent of the level of significance is more tractable and we consider 
it first. The following theorem shows the relation between ULFM’s and the 
£' norm: 

THEorEM 2.1. If f, is a ULFM for testing H: fe , 0 & 2, against g, then fy — g 
is a point of smallest norm in the convex set {f, — g}. (Here {f,} is the convex set of 
mixtures of f,’s formed by averaging with respect to a probability measure on 
the space 2. Thus {f,} is a convex set in the positive part of the unit sphere in an 
2’ space. ) 

We omit the direct proof of Theorem 2.1 since it is a special case (for k = 1) 
of Theorem 3.2. 

For Bernoulli distributions with a single trial the situation is particularly 
simple. The positive part of the unit sphere is the line segment in the Euclidean 
plane consisting of points (z, 1 — x) with 0 S zx S 1. Thus the convex set 
spanned by the hypothesis is a line segment which we can take to be closed. The 
closest point to the alternate is the one whose first co-ordinate minimizes |x — p!| 
(where (p, 1 — p) is the distribution under the alternate). This closest point is, 
in fact, a ULFM. (If min |x — p| = 0, the alternate is in the hypothesis and 
¢(x) = ais most powerful. If min |x — p| ¥ 0 there is a non-trivial test.) 

If we were able to find the LFM for a sample of n independent observations 
knowing it for a sample of one observation, we would not have to do the problem 
over for each sample size. Such an inductive procedure is possible in a fairly 
special case. 
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LemMa 2.2. [f ufx| f(x) = kg(x)} = 0 for every k, then 


(u X w)i(a, y) |f(a)fl(y) = kg(x)g(y)} = 


for every k. 
Proor. Let E = {(2, y) | f(x)f(y) = kg(x)g(y)}. Consider a section E”° of 
E by yo , i.e., the set of all x such that (2, yo) ¢ H. Hence x ¢ FE” if and only if 


f(x) = kg(x)g(yo)/f(yo). 


Then by the condition of the lemma u(E”°) = 0, and this is true for all yo except 
possibly the set where f(yo) = 0. But this set has measure 0 by condition (taking 
k = 0). Thus all y sections have measure 0 and hence E has product measure 
0 ({2], Theorem 36A). 

THEOREM 2.3. Let fo, be a ULFM for testing H, : fe, 0 € Q, against g; if 0 € Q 
and if u\x | fe,(x) = kg(x)} = 0 for each k, then for n independent observations of 
X, the density I] fo,(xx) ts uniformly least favorable for testing H,,: [Ti-sfo,( Z:), 
6 « Q, against II g(2;). 

Proor. We prove the statement only for n = 2. The proof can easily be 
generalized by an induction. 

By Lemma 2.2 and the Neyman-Pearson Lemma there is for each a a most 
powerful level a test ¢ for testing fo,(x)fe,(y) against g(x)g(y) such that 


g(x, y) 1 *f g(x)gly) > kfe,(x)fo,(y), and 
g(z,y) =0 a g(x)gly) S kfo,(x)fo,(y). 


Thus ¢ is the indicator function J; of a set S. Since fs, is a ULFM, the section 
S” of S by y isa most powerful test for testing H; against g for g(y) > 0 and S” 
= oforg(y) = 0. Hence 


(4) [ [ fo,(2) — fo(x)| du(x) 


and similarly 


(5) [ asCu) = SoCy)] day) 

Applying Fubini’s theorem we obtain 

(6) [ ep.) — fe(x)fe(y)| d(u x w) 2 O. 
Thus ¢ = is uniformly most powerful for testing H2 against g(x)g(y) ((3}, 


). Hence fo,(x)fo,(y) isa ULFM. 


Theorem ; 
has to be in H, , for otherwise we would be looking at 


3. 
The LF) 


I 
7 
i 


I] S fox; )dd(0); 
i=] 
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and this, in general, is not of the form S[]ta @(x;)dd(6). It is mixtures of the 
latter sort that are available as potential LFM’s. 

In many densities of practical importance, the most natural parametrization 
is given by an indexing set which is a product space. (For normal distributions 
the pair (u, o) where » is the mean and o° the variance is a natural choice.) 
In many hypothesis testing problems, the statistican is only interested in one 
co-ordinate of the parameter. However, the nature of the observations he can 
make forces him to consider the other co-ordinates (traditionally called “nui- 
sance parameters”). 

For problems of a certain form we can eliminate the nuisance parameters 
from consideration. Toward this end, let X, Y be independent random variables 
whose joint probability measure is absolutely continuous with respect to the 
product measure » x v; thus, 


(7) P((X,Y)¢eA) = I fe(x)gn(y) du x v. 


THEOREM 2.4. If there exists a probability measure \ on H such that h(y) = 
Sugn(y)drX(m), then fe,(x)h(y) ts a ULFM for testing H: fe,(x)g.(y), n € H, 
against fe,(x)h(y). 

Proor. A most powerful level a test for testing fy,(x)h(y) against fe, (x)h(y) 
is given by 


g(x) 1 af fe,(x)h(y) > kfo,(x)h(y) 
e(x) =O af fo(x)h(y) < kfo(x)h(y), 


where k is chosen so f ¢(x)fs,(x)du = a. Since ¢ is independent of y it follows 
that ¢ is most powerful for testing H against fy,(1)h(y). Since a is arbitrary 
fo,(x)h(y) is uniformly least favorable. (Theorem 2.1 suggests that this is a 
natural candidate for ULFM.) 

Coro.uary 2.5. (See [1], p. 86.) For testing H: fo,(x)g,(y), n € H, against 
fo,(x)gn(y), n € H, there is a uniformly most powerful test given by 


g(x,y) =1 tf fe,(x) > kfo,(x) 
g(z,y) =O of for) < kfo,(x). 


Proor. Apply the theorem to the problem of testing fe,(2)g,(y), » ¢ H, against 
fo,(2)gn,(y) for each m . A ULFM is fo,(2) gy, (y).- 

The corollary says, in effect, that we can not get information about @ by ob- 
serving a random variable whose distribution is independent of 6! 

Theorem 2.4 is a generalization of the method to obtain a uniformly most 
powerful test for the following example, used by Lehmann and Stein [5] (or see 
[3], p. 96). 

EXAMPLE. Let X, ,--- , X, be independently normally distributed with mean 
n and variance 6. For given 6, > 6, we wish to test the composite hypothesis 
H:0=%,— ©“ << + ©, against the simple alternative 6 = 6,7 = m. 
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The normal distribution of the parameter with mean 7 and variance (6; — 0) /n 
is uniformly least favorable. 


3. The non-uniform case. In the case where no ULFM exists, a result similar 
to Theorem 2.1 holds. 


Lemma 3.1. Let f and g be probability densities with respect to u; if k > 0, 
then 


(8) Uf — ol) =k 1+ 2eupsen f (o(z) — bflz)) de 


PROoF. 


kf — gll [ vag) — g(x))| dp 


[ Cage) — g(z)) du +2 [ (ole) — kfl2)) du 
“= o>kf 


k — 1+ 2 supaca / (g(x) — kf(x)) du. 
A 
THEOREM 3.2. Suppose fya(x) is an LFM at the level a for testing H: fy , 0 € Q, 
against g and the most powerful test is given by 
e(z) = 1 tf g(x) > Kfra(x), 
g(x) 0 if g(x) < kf,.(2), 


then kfia — g is a point of smallest norm in the convex set {kf, — g}. 


Proor. Consider any mixture f, of fs’s. Since g(x) is a most powerful level a 
test we have 


(9) [ e241) du s a 


Hence by Lemma 3.1 and by definition of ¢ 


Uf, — oll = — 1+ 2supsen f (g(x) — Wfs(x)) dy 


\ 


>k~1+ 2 | o(2)(9(2) — kf(x)) du = \\kfia — g|l. 


In solving a problem where there is no ULFM, we would like to know how to 
proceed from an LFM for a to one for a: . We prove three useful theorems. 
THEOREM 3.3. The set of LFM’s for a particular a is convex. 


Proor. Suppose f; and f2 are both LFM’s at level a and suppose 0 S a S 1. 
Let a most powerful test be g. Then 


g(x) = 1 if g(x) > k f(z) 
g(x) = 0 if g(x) < kfi(x) 
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fori = 1, 2. Let ko and b be determined by 


(10) koa = kyb and ko(l — a) = k(1 — BD). 


Then 


1 if g(x) > kofafi(z) + (1 — a)fo(x)] 
0 if g(x) < kolafi(x) + (1 — a)fe(zx)). 


Thus ¢ is a most powerful test for af, + (1 — a)f2 vs. g and consequently af, + 
(1 — a)fe is least favorable. 

Coro.uary 3.4. Suppose f; and fz are both least favorable at level a, and the k’s 
of the fundamental lemma are k, and kz with ky = kz > 0. Then for any ko , ky = 
ko = ke, there is ana,O0 S a S 1, such that the k of the fundamental lemma for 
testing af, + (1 — a)fe against g is ky. 

Proor. Determine a and b from (10). The corollary then follows from the 
proof of Theorem 3.3. 

THEOREM 3.5. For testing H: fe , 0 € Q, against g, any f, is least favorable for a 
closed set of a’s (which may be empty). 

Proor. Suppose lim, a, = a@ and f, is least favorable at level a, for each 
positive integer n. We show f, is least favorable at level a. Let the most powerful 
level a, test be ¢, . Then, because of the weak* compactness of the set of tests, 
({3], Appendix 4) there is a g(x), 0 S g(x) S 1 and a subsequence of {¢,}— 
which we take to be {¢,} itself such that lim, (¢,,f) = (¢, f) for every f. In par- 
ticular we have (since ¢, is level a, ) 


(11) (Gn; fo) > 
so that 


(12) (¢, fo) = 


Thus ¢ is level a for H. 
Similarly, since f, is least favorable at level an , 


(13) (¢,fi) =a 


and ¢ is level @ for fy vs. g. 
Let the power of ¢, be 1 — 8, , so 


(14) Ca g) = 1 = B.. 


Now power is a concave and hence continuous function of size so the most 
powerful level a test of f, vs. g has power 1 — 8 = lim,(1 — 6,). But (¢, g) = 
1 — 8, so ¢ is a most powerful level a test of f, vs. g, and satisfies the size re- 
quirements of the original problem. Hence f) is least favorable. 

Before proceeding to the final theorem of this section, we remark that for 
testing f vs. g (both simple) and for any k > 0, there is an a such that the most 
powerful level a test of f vs. g is 
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1 if g(x) > kf(z) 
if g(x) < kf(x). 


Clearly we can take any a satisfying 


(15) / fdus 
kf<o 


Further, if we let k. be the (not necessarily unique) k of the Fundamental 
Lemma such that ¢ is a most powerful level a test of f vs. g, then a2 < a; implies 
kk. & hy - 

THEOREM 3.6. Suppose for testing H: fe , @ € 2, against g there are a finite number 
of mixtures of the hypothesis, fi, --- ,fm, (not necessarily distinct) each least 
favorable for an interval I; of a’s such that 


m 


(16) U J; = (0, 1). 

t=1 
Then there is an LFM for some a which is a point closest to g (in £' norm) among 
the points of the convex set {f,}. 

Proor. Let the ends of the intervals be 0 = ap < a, <.--- < am = 1. We 
can assume, without loss of generality, that f; is least favorable for a;_; S a S a;. 
Let k(a, 7) be the k of the Fundamental Lemma for testing f; against g at level a. 
We can take k(a,1) = © and k(a,,m) = 0. Hence there is an 7 such that 
k(aii1,%) 2 1 2 k(a,,t) or k(a;,t) 2 1 = k(a;,i + 1). In the first case, 
by the remark above there is an a, a;, S a S a;, such that k(a,i) = 1 and 
hence by Theorem 3.1 f; is closest to g. In the second case by Theorem 3.3, 
fx = af; + (1 — @)fi4: is least favorable at level a; , and for a suitable choice of 
a the k of the Fundamental Lemma is 1 for f, by Corollary 3.4. Then f, is a point 
closest to the alternate. 

Thus one might be able to proceed stepwise—finding an LFM as a mixture 
closest to the alternate, then finding the set of a’s for which this is least favorable. 
For each point which is a boundary point of this set of a’s there may be another 
LFM. (This is true if there are only a finite number of LFM’s.) For this LFM, 
we could proceed as with the first one. This procedure works perfectly for some 
problems which will be considered elsewhere. The theorems suggest, however, 
that LFM’s are going to be difficult to find in general. 

We conclude with a pathological example. We take as sample space the 
positive integers; the distributions can be represented as sequences {a,} with 
a, = Oand >> a, = 1. We consider three sequences defined inductively. 


1/3 c, = (1/2)”* 
2/9 + , = 2/9 — « 
4/27 + « 


= 2°/3* 
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a; = (2/3)*be bs = (2/3)*ae 
ag = (2/3)°b; bs = (2/3)*a; 


Gnas = (2/3)°an baie = (2/3)%b, 


where 0 < ¢€ < 2/81. Then the sequences {a,/c,} and {b,/c,} are monotone 
increasing and 


6k+2 6k+2 6k+5 6k+5 


7 as? & eat Fie Le fork = 0,1,2,---. 
n=1 n=) n=l n=1 
Hence for testing H: {a,}, {b,} against {c,} any most powerful level a test is of 
the form 


¢g(n) 1 if n< nN 
g(n) = 0 if n>m; 


however no uniformly least favorable distribution exists. Further {a,} is least 
favorable for a closed set of a’s which is not the union of a finite number of 
closed intervals. 


The example is a fairly natural one to construct from the point of view of 
hypothesis testing. What it says about the £' norm as a byproduct is something 
of a surprise (at least to the originator). 
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ASYMPTOTIC EFFICIENCY IN POLYNOMIAL ESTIMATION! 


By Pau. G. Hoe. 
University of California, Los Angeles 


1. Summary. Asymptotic formulas are obtained for the generalized variance 
of the least squares estimates in polynomial regression under the assumption that 
the basic random variables are those of a stationary stochastic process, or a slight 
generalization of such a process. These formulas are used to study the informa- 
tion obtained by increasing the number of observational points in an interva: 
and by increasing the length of the interval. 


2. Introduction. In an earlier paper [1], some limited results were obtained on 
the increased efficiency of estimation in polynomial regression due to increasing 
the number of observational points, under the assumption that the basic random 
variables are correlated. These results were for two special stochastic processes 
only. In this paper, somewhat more general stochastic processes are studied and 
corresponding asymptotic formulas are obtained. 

The same notation will be used here as in [1]. Thus, y; , ye, --* , yn Will denote 
random variables associated with the fixed values x, , 2, +--+ , Zn , and the re- 
gression polynomial will be denoted by 


E(yi) = Bo t+ Bai +--+ + Beri « 


For convenience the interval (0, a) will be chosen as the interval over which 
observations are to be taken. Furthermore, in the development of the theory, the 
observation points x; , 2, -** , 2, Will be chosen to be the n equally spaced points 
given by the formula z; = 716, where 6 = a/n. 

The variables y; , y2, °°, Yn Will be assumed to be those of a stationary 
stochastic process. Thus, the y’s possess a common variance, and the correlation 
between y; and y; is a function of |i — 7|6 only. As a result, the covariance matrix 
S can be written in the form 


Tn-1 Tn—2 Tn-3 


Here r; denotes the correlation coefficient for two variables whose z values are 
j6 units apart. 

As before, it is necessary to introduce the spacing matrix X given by the 
formula 
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2 & -- § 
1 28 (28) --- (28) 


Xx “95 . . 
i nd (nd)> --- (na)* 

The measure of efficiency in the estimation of the 8’s that will be used here is 
the generalized variance, or, equivalently, the square of the volume of the ellip- 
soid of concentration. The generalized variance for best unbiased linear estimates 


is expressible by means of a well known formula [1! in terms of the matrices 
X and S. 


3. Least squares estimates. From a theorem in the book of Grenander and 
Rosenblatt [2], it follows that the least squares estimates of the coefficients in 
polynomial regression are asymptotically efficient for stationary processes. There- 
fore, in studying asymptotic efficiency, least squares estimates may be used in 
place of Markoff estimates, provided one is dealing with stationary processes. 
Since least squares estimates possess a simpler generalized variance formula than 
Markoff estimates, it is convenient to work with them in studying the asymptotic 
efficiency of various spacing designs. For least squares estimates, the generalized 
variance of polynomial regression coefficients is given [2] by the formula 

sy — |X’SX| 
(1) G.V. = TX’XE * 

Now consider any continuous correlation function p(t) defined over the closed 
interval [0, a]. Since it may be approximated arbitrarily closely by a finite series 
of the form 


N 
(2) p(t) _ > Cm exp { —amt}, 
m=1 


where a, > 0, it will be assumed that the correlation function is of this type. 
Because p(0) = 1, it is necessary that >> c,, = 1. In terms of this correlation 


function the value of r; will be given by 


N 
r; = p(js) = a Cm EXP | — amd}. 
m=1 
Then S assumes the form S = o°(w,;) where 
’ 
wis = >, cmexp {—and|i — ji}. 
m=] 
Let Sn = (w'}) where w{}? = exp {—a,,6|i — j|}. Then S may be written as 


N 
Y 2 CY 
S = 0° > cmSm- 


m=1 


As a result 


N 
(3) X'SX = 0° Dd) cmX'SnX. 


m=1 
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Consider the typical term, aj7),;4:, in X’S,,.X. Postmultiplying S,, by X and 
e e > ° ) ° 
premultiplying the result by X’ will show that a;7i,;4: can be expressed in 
the form 


n n 
( ay 6—yb 
Gss154 = 2 > myinyen. 
z=] y=1 


If the substitutions u = x6 and v = yé are made, it will be seen that this sum 
possesses the same asymptotic value with respect to 1/6 as the integral 


(4) (7) / | u've om"! du dv. 
6 0 0 


The typical term in X’SX will be denoted by bi4:,;41 . From (3) it follows that 
N 
5) bi41.j41 _ o> Cattltn.s41 5 
m=1 


and therefore that the asymptotic value of b;4:,;4: is obtained by substituting 
the asymptotic value of a;"},;4: into (5). In view of formulas (2) and (4), this 
asymptotic value may be written in the form 


1 2 5 a pa : 
(6) bint ju ™ (7) o | / u'v’p(u — v) du dv. 
0 “0 


Similar considerations will show that the typical term in X’X is given by the 
single sum >.2., (x6)*(x5)’, which possesses the asymptotic value 


- 1 [ ae qh 
. iihik on cooinlilicdinieiees « 
" (z) re are i 


It now follows from (1), (6), and (7) that the asymptotic value of the gener- 
alized variance is given by 


i+j+1 2 


(:) / | u'v’p(u — v) du dv 
G.V. ~ ame : 


sitjt+l 


Since the factors in 1/6 cancel, this asymptotic expression reduces to 


a a 
o | / u'v’p(u — v) du dv 
ais 0 in is 


i+j7+1 2 
itjt+l 
4. Nonconstant variance. The preceding results were based on the assumption 
that the process is stationary. This assumption is certainly a realistic one for 
many applications, at least as far as the correlation function is concerned. A more 
general situation, in which the variance is assumed to be a continuous function 
of time in the closed interval [0, a], can be treated by the same methods as those 


(8) i. ¥: 
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just employed and will be found to yield similar conclusions with respect to 
asymptotic efficiency. 

The demonstration of this last statement can be carried out by considering o;, 
as a polynomial in ¢. The resulting change in the matrix S,, will change the 
integral (4) to the integral 


1 2 a ra : a deal 
(9) ( ) / | u've @m'" "la. og, du dv, 
6 0 40 


where, say, 
ou = yot yu + --> + yw’. 


The substitution of this quantity in (9) will show that (9) reduces to 


l - - ae i+p —a u—v| 
(10) (2) > eV | [ yi tPyttag—emle—"l ye dy. 
6 Pp 0 +0 


=) g=0 


As a result, the asymptotic value of (10) is of the same form as for (4), and 
therefore one would expect the same type of asymptotic efficiency properties to 
hold. Such properties will be considered in the next section. 


5. Adding observations. The result given by (8) shows that when a large 
number of observations has been made in an interval the amount of information 
gained by taking additional observations is negligible. Thus, if the number of 
equally spaced points in an interval is doubled, which means that 5 must be re- 
placed by 6/2, the same asymptotic value of the generalized variance is obtained 
because (8) does not depend upon 6. This holds regardless of the nature of the 
correlation function, provided that it is continuous. It holds not only for any 
stationary process but also, in view of formula (10), for a stationary process 
that has been modified to permit the variance to be any continuous function of t. 

If the y’s are independent random variables, the generalized variance will ap- 
proach zero as 6 approaches zero, and therefore it certainly pays to add observa- 
tions in this case. In view of this fact, it is clear that the size of the sample needed 
before one can conclude that it is hardly worth while taking additional observa- 
tions depends rather heavily upon the nature of the correlation function. For the 
purpose of observing how the value of the generalized variance changes as the 
nature of the correlation function changes, consider the special correlation func- 
tion p(t) = e “‘ that was considered in [1], and assume that « = 1. Suppose the 
value of a is changed to the value 2a. This is equivalent to squaring the value of 
the correlation coefficient between any pairs of y values, and hence in weakening 
the correlation relationship to this degree. For any particular value of a and a 
this effect of doubling a can be determined numerically by means of formula (8) ; 


however, it is difficult to make such a comparison for a general @ unless a is very 
large. Therefore, consider next an approximation to (8) which is valid for large 
values of a. 
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a u 
i (u—v) 
ly = [f u've “““”’ dv du. 
0 “0 


The value of bj41.;4: in (6) will then be given by the quantity (Ji; + I,)/®. 
Repeated integration by parts in the first integration will show that J;; can be 
expressed in the form 


oe ee 2 ie 
i+j+1 ati+] a i+j-—1 


a 


; ; —_ “*-. wee ) , , —_— “ef S . 
-(- 1799 ee p+ (— 1) +1 II a | u'e “" du. 
0 


a! i+1 atl 


For large a the first term will dominate this expression; therefore J;; may be 
approximated by 


+ 5+] 
ee ae 


ai+tj+1 
The double integral in (8) will therefore be approximated by twice this value; 
consequently, for large a, the generalized variance in (8) may be approximated by 


~ 
yo 


oe 
. — ‘ \k+1 
(11) re OES 2 _ (2/a) 
I. ~~ . a —_ 


a" 2 Aa**” ’ 
where A = |1/(¢ +7 4+ 1)|. 
In making comparisons by means of the generalized variance, it is convenient 
to consider the quantity introduced in [1], namely, 


lz V. (a, a) | ke 
G. V. (2a, a) 
The value of this quantity gives the number of replications of an experiment in 
the given interval needed to yield the same estimation information, as measured 
by the generalized variance, as that obtained through doubling the value of a. 
From (11) it follows that the value of this quantity is 2 here; therefore when a 
large number of observations has been made, doubling the value of a large a 
yields the same estimation information as repeating the experiment. If the typi- 
cal element in the covariance matrix (X’X)'(X’SX)(X’X)™ is computed, 
using the same approximation as before, it will be seen that doubling the value 
of a multiplies all elements of this matrix by 4; therefore in the sense that the 
variance of each estimate is multiplied by 4, the efficiency of estimation is doubled 
through doubling a. 

The preceding results show that doubling the number of observation points in 
an interval gives rise to two counteracting effects. The favorable effect arises 
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from doubling the size of the experiment. The unfavorable effect arises from in- 
creasing the correlation between neighboring variables. For large samples, these 
two effects approximately nullify each other. Since the preceding result, that 
adding points does not help much here, holds for a large value of a, and hence 
for a weak correlation relationship, the advantage of adding observations in a 
fixed interval would be expected to be even less when there exists a strong corre- 


lation relationship. Some numerical results in this connection may be found 
in [1]. 


6. Extending the interval. A second form of comparison which is of interest in 
regression problems is that arising when the interval over which observations are 
to be taken is extended. This comparison for the same correlation function as in 
Section 5 can be made by replacing a by 2a in (8) and then calculating the 
quantity 


42) oe To 
G V. (a, 2a) 

When a is large, the approximation given in (11) may be used, in which case 
the value of this quantity will reduce to 2***. Thus, when a is large and a large 
number of observations has been made, doubling the number of equally spaced 
observations by doubling the length of the interval gives approximately as much 
estimation information as 2**’ replications of the experiment in the original inter- 
val. This result was obtained in [1] by using other methods. 

When a is not sufficiently large to justify the use of approximation (11), 
numerical methods are needed to observe what effect doubling the length of the 
interval has on the generalized variance. Thus, calculations fora = 1, a = 1, 
and k = 2 by means of formula (8) yielded the value 15 for the quantity given 
in (12). Under independence the value would have been 8; therefore there ap- 
pears to be even greater advantage in extending the interval when strong correla- 
tion exists than under independence. 
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MAXIMUM LIKELIHOOD ESTIMATION OF A LINEAR 
FUNCTIONAL RELATIONSHIP 
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Republica, Montevideo, Uruguay 


1.{Introduction and summary. We shall consider the problem of estimating a 
linear functional relationship 


(1.1) a+ fit +--+: + Br = 0 


among p variables 7’, --- , 7? when the observed values do not satisfy it because 
all of them are subject to errors or fluctuations (superscripts will, in general, be 
indexing symbols, not powers, in this paper). Geometrica!ly, the problem is 
equivalent to fitting straight lines or planes to a series of g observed points when 
all the coordinates are subject to error. This problem has a long history. R. J. 
Adcock, in two papers written in 1877 [2] and 1878 [3], solves it by minimizing 
the sum of squares of the orthogonal distances from the points to the hyperplane 
(1.1). Adcock and many other authors used the model 


(1.2) Y¥i=ait+ « 


where y; and 7; are column vectors representing the observed and true points, 
and the errors e; are independent random vectors with mean value zero. Since 
the 7; are points lying on the hyperplane (1.1), we have in matrix notation 


(1.3) a+ Br; = 0 


where 8 is a row vector with components §,,--- , 86>. If we assume that the 
7, are independently drawn from a probability distribution, then the estimate of 
8 obtained by Adcock is not consistent. In fact, in 1937, J. Neyman [21] pointed 
out that if the distribution of the true vectors 7; and the errors e; is normal, then 
the distribution of the observed vectors y; is also normal and, being determined 
by moments of the first two orders, it is not sufficient to determine the parameters 
a and 6; the functional relationship (1.1) is, therefore, nonidentifiable. Several 
methods have been proposed to overcome this difficulty, for which the reader is 
referred to a recent general survey of the literature by A. Madansky [18], which 
also contains an extensive bibliography. One approach is to assume that we know 
the covariance matrix of the errors up to a numerical factor. As was shown, in 
general, by C. F. Gauss [8], [9], in the case of independently and normally dis- 
tributed observations whose variances are known up to a numerical factor, the 
maximum likelihood estimate is simply the weighted least-squares estimate. This 
estimate of the linear functional relationship was obtained as early as 1879 


c 
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by C. H. Kummell [15], for the case in which the components é: of the vectors 
€; are independently distributed with variances x‘‘o,, , where the «"’ are known 
constants and the o,, are known only up to a numerical factor. Kummell found 
that his estimate coincides with the estimate proposed by Adcock only in the 
vase in which all the variances are equal. 

M. J. van Uven [24] considered the case in which the errors e; are independent 
and have the same multivariate normal distribution with a covariance matrix 
> which is known only up to a numerical factor. His method is essentially the 
following. He considers 7’, --- , 7” as skew coordinates in a new, “isotropic” 
space in which the rectilinear orthogonal coordinates are independent and have 
the same variance. In the new space he then uses Adcock’s principle of adjust- 
ment, namely, he chooses as the estimate the hyperplane which minimizes the 
sum of orthogonal distances. Later, T. Koopmans [14] showed that van Uven’s 
estimate is the maximum likelihood estimate for the case considered. If the 7; 
are assumed independently drawn from a probability distribution, the estimate 
of the linear functional relationship thus obtained is consistent, but the estimate 
of = converges in probability to p ‘= (see also [16]). B. M. Dent [7] solved the 
maximum likelihood equations in the case in which = is not known, but, as was 
shown later by D. V. Lindley [16], her estimates are not consistent, and 
should, therefore, be rejected. More recently, J. Kiefer and J. Wolfowitz [13] 
showed that, under certain conditions of identifiability, when the 7; havea prob- 
ability distribution, the method of maximum likelihood, if properly applied, 
yields consistent estimates of both the linear functional relationship and the 
probability distribution of the 7;. However, Kiefer and Wolfowitz do not give 
explicit expressions for the maximum likelihood estimates. 

No difficulties with respect to the identifiability of the functional relationship 
or with respect to the consistency of the estimates arise if we can replicate the 
observations. The model is now, in matrix notation, 


(1.4) Vig = TE Giz. 


In general we have n; observed points for each of the true points 7;, and it is 
assumed that not all the 7; lie on a translated subspace of dimension smaller 
than p — 1. Obviously this implies that gq 2 p. This model has been considered 
previously by G. W. Housner and J. F. Brennan [12], J. W. Tukey [23] and by 
F. 8S. Acton [1] (see also [11] and [25]). If we assume that the errors e€,; are inde- 
pendently and normally distributed with a known covariance matrix 2, we lose 
nothing if we consider only the averages y;. = nz >>; yi; We have then the 
same model (1.2), with the only difference that y; is replaced by y;. . If, however, 
the covariance matrix is not known, we can now obtain in the usual way an 
estimate S of =. F.S. Acton [1] suggested the use of S instead of = in the estimate 
obtained by the method of maximum likelihood in the case of known *. In this 
paper it will be shown that the estimate thus obtained is the maximum likelihood 
estimate when > is unknown. 
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If the design is a (in general incomplete) block design, we have, if the treat- 
ment 7 is applied on the block J, 


(1.5) Yiu = 11 +05 + €:;, 


where b; is a column vector representing the block effect. Considering the block 

effects b; as unknown constants, we get in the usual way the intrablock estimates 

t; of the treatment effects 7;. Then, the same equation (1.2) still holds if y; is 

replaced by ¢;, but in general the errors e; and consequently also the estimates 

t; will no longer be independent. If the design consists of r replications of a basic 

design, then the covariance of two errors ¢€; , €;’ will be given by 

. «> 

(1.6) cov (¢é,¢€”) = 
r 

where x’ are known coefficients and the matrix = is unknown. 

In this paper maximum likelihood estimates for = and the parameters of the 
linear functional relationship will be found for the case in which (1.6) holds. It 
will be shown that the maximum likelihood estimates &, 8 in the case of un- 
known = are obtained from the corresponding estimates in the case of known = 
by simply replacing = by the linear regression estimate S. In the last Section it 
will be shown that if the maximum likelihood method is applied directly to the 
variables y;; instead of the variables ¢; and S, then the same estimate 8 is ob- 
tained, but the estimate of = is multiplied by 1 — k’ + N', where k is the 
number of experimental units in each block and N is the total number of ex- 
perimental units. All of the estimates obtained are consistent, with the excep- 
tion only of the estimate of = obtained by the direct approach in the last 
Section, which converges to (1 — k")2. 


2. The model. We shall consider now in more detail the intrablock analysis 
of a (in general incomplete) block design to which the additive model (1.5) 
applies. We shall assume that errors coming from different experimental units 
are independent, and that the errors coming from a single experimental unit 
have a multivariate normal distribution with zero means and covariance matrix 


> = fon}. Therefore, 


, 


‘ h 
(2.1) COV (€i;, €i3) = Onn’, 


where ¢;;(h = 1,---, p) are the components of ¢;;. If we do not consider the 
linear functional relationship (1.1), the estimation of 7; and = is simply a linear 
regression problem. In order to arrive at a unique solution 4,---, t,, it is 
usual to add some arbitrary linear restriction, say, 


> wit; = 0, >, wo ¥ 0. 


i 
where x" are known coefficients and the matrix 2 is unknown. 

It is known ([5] Section 8.2) that the linear regression estimates ¢; are linear 
combinations of the observed vectors y;; . If the design consists of r replications 
of a basic design, then the covariance of two estimates ¢; , t;’ is given by (1.6), 
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where the x’ are known coefficients that depend on the basic design and the linear 
restriction (2.2). Since the errors are assumed to be normally distributed, it 
follows that the ¢; have a multivariate distribution with means 7; and covariances 
given by (1.6). Finally, the linear regression estimate of op,’ is 


Deis et 
ay = €ij Cij 
V ij 


where v = N — q — b + 1, b is the number of blocks, e4; is the linear regression 
estimate of the error ¢;; , and the prime over the summation sign indicates that 
the sum must be extended over all pairs (7, 7) such that treatment 7 appears on 
block j. It is known ([5], Section 8.2) that the estimated covariance matrix 
S = {sy} is independent of the ¢; and has a Wishart distribution with mean 
value 2 and » degrees of freedom. 

We have then the following linear functional relationship model. The p-dimen- 
sional random variables ¢, , --- , ¢, have a multivariate normal distribution, with 
means 71, °°* , T, that satisfy (1.3) and covariances 


a’ 


cov (t,t) = ~—eall 
r 


where r and the x" are known coefficients, and 2 is unknown. The matrix S is 
an unbiased estimate of 2, is independent of the ¢; and has a Wishart distribution 
with a number of degrees of freedom v which tends to infinity when r — ~; 
the quotient r/v converging to a positive limit. 

The matrix K = {x} is always nonnegative because, for a given h, r onmK 
is the covariance matrix of the hth components ti, --- , t of th, ---, tg. If the 
t; are not subject to any linear restriction like (2.2), then for any h the distri- 
bution of tt, --- , t is of rank q (see for example [6], p. 297) and the matrix K 
is positive definite. If there is only one linear restriction (2.2), the matrix K is 
of rank g — 1 and 


(2.5) 


where w is the column vectors the components of which are w; , --- , wg (see for 
instance [{17]). 


3. Covariance matrix known. We shall consider in the first place the case in 
which the covariance matrix = is a known positive definite matrix and the ¢; 
are not subject to any linear restriction (and consequently K is a positive definite 
matrix). 

From (2.4) it follows that the covariance matrix of all the variables t’ is 
r 'K @ &, where the symbol @ denotes the Kronecker product of two matrices. 
The determinant of this covariance matrix is r ”*|K|”|Z|*. Therefore the prob- 
ability density function for t} , --- , ¢? is, up to a numerical factor, equal to 


(3.1) [K|??| 5) ere 
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“2 y—l 
where, if K 1 Kis}, 


Q= > wca( ts — #)'S “(te — ry). 
re i 


We shall denote the trace of a matrix X by tr X. Then, in matrix notation, 
(3.2) Q=tr>'‘(t—7r)K (t— 71)’, 


where ¢ is the p X q matrix the 7th column of which is ¢; , and similarly, 7 is 
the p X q matrix the 7th column of which is 7; . The maximum likelihood esti- 
mates of a, 8 and 7 are the values 4, 8 and # that minimize (3.2) subject to the 
conditions (1.3) which may be written 


(3.3) au + Br= 0, 


where u is the row vector the g components of which are equal to 1. This problem 
was solved by Koopmans [14] for the particular case K = J. In what follows, 
unless otherwise specified, the symbols a, 8 and 7 will denote mathematical 
variables (not true values or true parameters). In the first piace we shall find the 
minimum of (3.2) subject to the condition (3.3), for given values of a and 8. 
Since = and K are positive definite, uniquely defined positive square roots >’ 
and K’ exist (see [10], p. 166). Consider the change of variable 


(3.4) 6 = =(t — r)K™. 
Since tr XY = tr YX, (3.2) may be written, if 6; is the 7th column of 4, 


(3.5) Q = tr és’ = _ cA, : 

- 
that is, Q is the sum of the squares of the distances from the origin to the points 
6; . The condition (3.3) is, in the new variables, 


(3.6) v¥: — BD 5; = 0, (¢ = 1,---,q) 


where ; is the ith component of (au + 6t)K. We have to minimize the sum 
of squares of the distances from the origin to the points 6; , subject to the condi- 
tion that each 6; lies on the corresponding hyperplane (3.6). Note that these 
are, in general, g parallel hyperplanes. This was the principle from which M. J. 
van Uven [24] derived his estimates for the case K = 7. Obviously, the minimum 
is reached when 6; is on the perpendicular from the origin to the hyperplane 
(3.6) and, therefore, 


>a’ 
i en 
Bzp’ 


Going back to the old variables, we have 


.. a 43 
t—Tr = =; (au + ft). 
B=B 


~ 
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By substitution of (3.7) into (3.2) it follows that the minimum value of Q, for 
given values of a and 8, is 
(au + Bt)K (au + Bt)’ 
B=B’ , 

We shall now find the minimum value of Q, for a given 8. By differentiation we 

y—l » c 
have (au + Bt)K wu’ = 0. If we set 


Q1 «i 


“1 
K “w’ 


w= — 
uK- yu’ 


and t. = tw = > tw; , where w,, --- , Ww, are the components of w, we get 


(3.9) 


Since Q, tends to + © when a — +~, it follows that the value a given by (3.9) 
minimizes Q; . From the definition of w we have uw = dw, = 1, so that if, as 
happens with the usual matrices, K, the w; are nonnegative, then ¢. is a weighted 
average of the vectors ¢;, with weights, w,; . If we write At; = t; — t. and At = 
t — t.u, we have at the minimum au + 6t = BAt, and, therefore, the minimum 
value of Q; is 


(3.10) 2 = Bre 
Bz=B 

where 

3.11) F = AtK ‘(At)’ 


is a nonnegative matrix. 
We now have to find the vector 8 which minimizes Q, . Consider the change of 
variable B = ud. Substituting into (3.10) 


Ist 
= 
(3.12) ee se 


/ 


uu 


The vector 4 which minimizes this expression is obviously any proper vector 
of the smallest proper value of the nonnegative matrix > ‘Fz; that is, A is 
given by 


i(z rst — r»J) = 0, 
where \ is the smallest root of the equation, 
>¢r=+ — xrl| = 0. 


In computations the following equivalent equations may be preferred. The 
maximum likelihood estimate of 8 is given by the equation 


(3.13) 
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where X\ is the smallest root of 
3.14) F — r| = 0. 


4. Covariance matrix unknown. We assume now that the covariance matrix 
> is unknown, but that we have an estimate S which is independent of t and has 
a Wishart distribution with mean value = and yv degrees of freedom, the prob- 
ability density of which is proportional to 
(4.1) ; 2°"? exp —ivytr = 'S 
for S positive definite and 0 otherwise, where S = {s),,-}. It is assumed, as before, 
that = is positive definite, and that the ¢t; are not subject to any linear restriction 
(and, therefore, K is positive definite). We shall consider only the case, which 
happens with probability 1, in which S is positive definite. 

The joint probability density of all of the variables ¢} , sa, is proportional to 
the product of (3.1) and (4.1). The maximum likelihood estimates are the 
values 7, &, 8 and $ that maximize 


(4.2) zie exp —3 tr > '[vS + r(t — r)K (t — 1)’ 


> 


subject to the condition (3.3). Instead of maximizing (4.2) we can minimize 
‘ »—1 Y / pi +—1 
(4.3) tr> [pS + r(t — r)K (t — r)’] — (q¢ + »v) log |Z |. 


We may in the first place keep a, 8 and 7 fixed and find the value of = which 
minimizes (4.3). By the Lemma 3.2.2 of Anderson’s book [4] the maximum 
likelihood estimate of = is 


(4.4) $= (q+) PS + r(t — #)K '(t — #)'] 


where 7 is the maximum likelihood estimate of +r (based on maximum likelihood 
estimates of a and # and the restraint (3.3) ). Substituting this estimate of = (as 
a function of &, 8, #) into (4.3) we see that 4, 8, # must minimize 


(4.5) vS + r(t — r)K (tt — 1)’ 

subject to the aforementioned restraint. Consider the change of variable 
(4.6) 6= S*(t—+7)K° 

We have then to minimize 

(4.7) vl + ré6’| 

subject to the following conditions, similar to (3.5), 

(4.8) vi — BS'8; = 0 


where, as in Section 3, y; is the ith component of (aw + 8t)K™ and 4; is the ith 
column of 6. We shall find in the first place the minimum of (4.7) for fixed values 
of a and 8. The expression (4.7) may be written 


v? + Dw? "tr + De??? + --- + Dz” 
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where D, is the sum of all the principal minors of order h of the matrix 64’. The 
elements of the matrix 66’ are the product-moments with respect to the origin 
of the system of points 6;, and consequently all the principal minors of 66’, and 
a fortiori all the coefficients D,, are non-negative. In particular, 


(4.9) D, = tr 6s’ = >> 856; 


is the sum of the squares of the distances from the origin to the points 6; . This 
is a minimum when all the points 6; are on the perpendicular from the origin to 
the hyperplanes (4.8). Since these points are on a straight line which goes through 
the origin, it is easily seen that at these points all of the principal minors of 
66’ of order 22, and consequently, also D,,--- , D,, vanish simultaneously. 
Therefore, minimizing (4.7) is equivalent to minimizing (4.9) subject to the 
conditions (4.8). The same problem was solved in Section 3, with the only dif- 
ference that we now have S instead of =. Therefore, the maximum likelihood 
estimate of 8 is given by the equation 


(4.10) B(F — 1S) =0 
where | is the smallest root of 
(4.11) IF — 1lS| = 0. 


Equivalently, 8 = mS’, where mm is any proper vector of the minimum proper 
value | of the nonnegative matrix S “FS. The maximum likelihood estimate of 
a is given by the same equation (3.9) as before, the maximum likelihood estimate 
of 7 is 

SA’Bat 
/ a A 
(4.12) f=t{—-[ 


AGA 


BSB 
and the maximum likelihood estimate of = is given by (4.4), or, equivalently, by 
ee) 


(4.13) $e ae (»s + rl 
q+v 

5. Consistency. When the number r of replications tends to infinity, S con- 
verges in probability to = and F converges in probability to @ = ArK ‘(Ar)’ 
where Ar = +r — r.uand r. = rw. The direction of the true vector @ is the only 
direction orthogonal to all the vectors Ar; = 7; — 17. , because all points 7; lie 
on the hyperplane (1.1) but do not lie on any translated subspace of smaller 
dimension. Consequently, up to an arbitrary factor, the true vector 6 is the only 
vector such that BAr = 0, and, since K“ is a positive definite matrix, it is also 
the only vector such that 668’ = 0. But, since ® isa nonnegative matrix, 8 is 
the only vector (up to an arbitrary factor) such that 6& = 0. Let u be defined 
by B = wt ' Then u is the only vector (up to an arbitrary factor) such that 
ux ‘b> = 0. Therefore, the matrix 2 'é>~ is singular, the smallest proper 
value is 0, and its only proper vector is u. Since the proper vector is a continuous 
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function of the matrix, * converges in probability’ to «1 and consequently 8 
converges in probability to 8 (up to an arbitrary factor). Since BAt converges in 
probability to BAr = 0, it follows from (4.12) that # converges in probability 
to the true matrix r. It follows then easily from (3.9) that & converges to a and 
from (4.4) that $ converges to = (since by assumption, the quotient r/v con- 
verges to a positive limit). 


6. Homogeneous linear functional relationship. We assume as before that 
there is no linear restriction (2.2), and, therefore, K is positive definite. If it is 
known that a = 0, the equation (3.8) may be written simply 


, ptK~*?’p’ 
(6.1) = ; 
dL, Q: B>B 


Instead of defining F by (3.11) we shall define F by 
(6.2) F = tK'?, 


and we have 


rT BF’ 
(6.3) Q = ye 
which is similar to (3.10). We can then proceed by the same method that was 
employed in Section 3. The results and formulae obtained there and in Section 4 
will also apply to the homogeneous linear functional relationship case, provided 
that it is understood that F is given by (6.2) and not by (3.11). 


7. Linear restrictions. We shall now assume that the ¢; are subject to a known 
single linear restriction (2.2). In matrix notation 


vee lo = 0. 
In this section we shall assume that the coefficients w; are normalized so that 
(7.2) uw = 1. 


Since ¢ is assumed to verify (7.1), it follows that it is impossible that t = u. 
It follows also that we have 


(7.3) Tm = 0. 
If we multiply (3.3) on the right by w we obtain, by (7.2) and (7.3), 
(7.4) a = Q. 


By (3.3) we have then 


(7.5) Br = 0. 


1 Because, if the vector valued function f is continuous at the point a, and if the random 
vector x converges in probability to a, then f(z) converges in probability to f(a). This result 
follows as a special case (yy constant) from Corollary 2 of [20]. (See also Lemma V of [19]). 





ESTIMATING A LINEAR FUNCTIONAL RELATIONSHIP 105 


Since there is only one linear restriction (2.2), the matrix K is of rank qg — 1, 
as was pointed out in Section 2, and consequently, the previous theory cannot 
be applied directly in this case. Let L = {Li} be a nonsingular g X q matrix 
whose last column is the vector «a, 7.e., 


(7.6) Liq = 


and such that all other columns have a sum equal to zero, 7.e., 
(7.7) > Li = 0 

7 
Consider the new variables 


(7.8) =m > Gbe:. 
‘ 


By (2.2) and (7.6) 


(7.9) (h = 1,---,p). 


If we denote by t* the p X (q — 1) matrix the elements of which are the ¢7" 


4 4 xh . _ . 
with 7 # g, and by r* and 7;° the corresponding true values, we have in matrix 
notation 


(7.10) (*|0) = tL,  (r*|0) = cL. 
Therefore, if we multiply (7.5) on the right by L, we obtain 
(7.11) Br* = 0. 


Consequently, the new variables 77"(i ¥ q) satisfy an homogeneous linear func- 
tional relationship with the same parameter 8. [t can be easily seen that the 
covariance matrix of the pq variables tt" isr“L/KL @ >. From (2.5) and (7.6) it 
follows that 


, K* 0 
719 ‘tir ate 
(7.12) L KL t a 


where K* is a (gq — 1) X (q — 1) matrix. Since L is nonsingular, and K has 
rank g — 1, it follows that L’KL has also rank g — 1 and, therefore, K* is non- 
singular. Since r-'K* @ & is the covariance matrix of the p(q — 1) variables 
(i # q), in order to find the maximum likelihood estimates, in the case of 


unknown 2, we have to minimize 
(7.13) tr= “(oS + r(t®* — r*)K*"(t* — 7*)’] — (¢ + v)log|2=~ 


subject to the only restriction (7.11). This is an homogeneous linear functional- 
relationship problem of the type discussed in the previous section. Therefore, 
the maximum likelihood estimates 8, $ are given by (4.10) and (4.13), where / 
is, as before, the smallest root of (4.11) and F is given by 


(7.14) F = t*K*'t*’. 
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We shall now show that we also have 
(7.15) F = t(K + xou'u)'?, 


where xo is an arbitrary number different from zero. From (7.12) and (7.7) it 
follows that 


<: 
(7.16) L'(K + «ou u)L -(% a 
0 Ko 


and, consequently, K + xow’u is nonsingular. Moreover, the right-hand member of 
(7.15) is equal to 


tL{L’(K + xou’u) L) ‘Lt 


and, therefore, by (7.10) and (7.16) is equal to the right-hand member of (7.14). 
In practical applications the expression (7.15) will be used with preference to 
(7.14). Moreover, in the case of balanced designs, all of the elements that are 
not in the diagonal of K will have a common value «’ # 0. By choosing xo = —x’, 
the matrix K + xou’u is a diagonal matrix and, consequently, the computations 
are considerably simplified. 

A similar argument shows that the maximum likelihood estimates are also 
the values that minimize 


(7.17) tr> [pS +r(t — 7r)(K + xu’u) (t— 7)'] — (q + v)log |= 
subject to the conditions (7.3) and (7.5). 


8. Intrablock analysis: direct approach. We shall now estimate the linear 
functional relationship by applying directly the maximum likelihood method to 
the model (1.5), considering the coefficients b; as unknown constant vectors 
(intrablock analysis). In order to arrive at a unique solution 7 we add as usual 
the linear restriction (2.2). It follows then, as was shown in the previous section, 
that a = 0. As was already mentioned in Section 2, we assume that the errors 
coming from different experimental units are independent, and that the errors 
coming from a single experimental unit have a multivariate normal distribution 
with zero means and covariance matrix 2. If there are N experimental units, 
then the probability density for all pN variables y,} is proportional to 
(8.1) |Z} exp —}4 a og Ow 


hh’ 


where 


, 
>< h h’ 
(8.2) Qa = De ese; 
ij 


The maximum likelihood estimates are the values that maximize (8.1) subject 
to the conditions (1.3) and (2.2), or, equivalently, the conditions (7.3) and 
(7.5). Let y*;, (; and 7"; denote the average of yields, estimated and true treat- 
ment effects for the hth variable over all experimental units of block 7. Define 
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the adjusted yields and adjusted treatment effects by 
"ee h se A ah A h 
Yiiy = Viti wae Sr bi; =t- t5, van. VE Fes 5 
h . 
where 7; is the hth component of 7;. 


It can be easily shown that, if ?* is the ms aximum like lihood estimate of 7’, 
then the maximum likelihood estimate of 6} is 6} = y"; — #"; , where ?"; denotes 
the average of the 7? for all treatments occurring in block j. By substitution into 
(8.1) and (8.2) it follows that the maximum likelihood estimates 8, 7, & are the 
values that maximize, subject to the conditions (7.3) and (7.5) the expression 


(8.3) (>| exp —42, of” Gy 


where 


/ 


(8.4) Ow = dX (ge; — A); -— 7). 


tJ 


Suppose that we have numbered serially the N experimental units, and let 
7. , #., & denote the adjusted yield, and the adjusted treatment effects (true 
and estimated) for the nth experimental unit and the Ath characteristic being 
measured. If 7", z’, i’ denote the row vectors the N components of which are the 


corresponding experimental unit values, we have 
~ , awh ~h ah’ wh’ 
(8.5) Ww =F -F)GF —7F ). 


From the definition of 7’ it follows that 


~ \ ~h h zh h 
(8.6) w= Der, P= Lest, 
é ‘ 
where the c; are row vectors that depend only on the experimental design. Since 


the ¢: are the values that minimize Qj, subject to the only conditions 


(8.7) > wt: = 0 (h = 1,--+,p) 


Lv 


by differentiation with respect to 7; we have, if \ is a Lagrange multiplier, 
wh ghys 
c(y —t)’ + rw; = 0. 


If we multiply this expression by ¢; and add for i = 1, --- , g we have by (8.6) 
and (8.7) 


(8.8) (g* — i)’ = 0. 
If we multiply instead by 7; we have 

(8.9) . (7° —?t)' = 

By (8.8) and (8.9) it follows then from (8.5) that 


mw = (gf -*)y Gg -Py+e0-AE 





1060 C. VILLEGAS 


and by (2.3) 

" , zh wh zh’ ~h’ 
(8.10) Oy = vou tll —r)t —F ’, 
Let A; be the number of blocks where both treatments 7 and 7’ are applied (and 
consequently \,;; will be the number of replications of the treatment 7). Let A = 
{rx:} and let Ag be the diagonal matrix whose diagonal is Ay, , --* , Agg . Assume 
that all blocks contain k experimental units and define the matrix K = {ki} by 


(8.11) Win cat a (A. _ ey. 
r c 


It can be easily shown that 


awh sh’ wh’ yy = 
—7)(t -—7ry)y= r>. (Ri — WW; 
i,i’ 


A 


(t 
Therefore, in matrix notation we have, by (7.1) and (7.3), 


(8.12) yo Qi = tr D[vS + r(t — 7) R(t — 7)’). 


& 
Let Y be the matrix of the adjusted total yields 7 = 575 g3;. It can be shown 
that (see for instance [4], p. 251) Y = t(Ag — A/k). Then, by (8.11), 

(8.13) Y = rt(K — wo’). 

We shall show that the system of equations (7.1) and (8.13) is equivalent to 
(8.14) tk = ¥. 


It is obvious that (8.14) is a consequence of (7.1) and (8.13). From the defini- 
tions of A and Y it follows that Au’ = kAyu’ and Yu’ = 0. By (8.11) we have then 


(8.15) Ku’ = wu. 

Therefore, if we multiply (8.14) on the right by wu’, we obtain the equation 
(7.1). From (7.1) and (8.14) the equation (8.13) follows immediately. We as- 
sume now that the design of the experiment is such that the system of equations 
(7.1) and (8.13) has a unique solution, t. Then it follows that (8.14) has a 
unique solution, and, therefore, that K is nonsingular. If KR’ = {x"*} is the in- 
verse matrix, then 


and, therefore, 


h h’ 
cov (4; , bi 


It can be shown that 


cov (Y}, Y}) 
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and, therefore, (1.6) holds, with 


In matrix notation K = K' — K™‘w(R“'w)’ and by (8.15) 
(8.16) KR = K + xou’u, 
where xo = (uw) is an arbitrary positive number. By substitution into (8.12) 
we have 
oy oO = tr D [pS + r(t — r)(K + xou’'u)“(t — 7)’. 


> 


To maximize (8.3) subject to the conditions (7.3) and (7.5) is then equivalent 
to minimize 


(8.17) tr > [vS + r(t — r)(K + xou’u) (t — 7)’] — N log |>~ 


subject to the same conditions. But in Section 7 the same problem was solved, 
with the only difference being that we now have N instead of g++ » = N —b+1, 
where b is the number of blocks. The estimate @ is, therefore, the same as before, 
but instead of (4.13) we have now 


A 


(8.18) $= i 'yS + rl [S8’BS/8S8']}. 


This expression is equal to the estimate (4.13) multiplied by 1 — kK + N™. 
Therefore, since (4.13) converges in probability to the true value 2, the estimate 
(8.18) converges in probability to (1 — k”')3, and consequently, it is incon- 
sistent. This fact is explained by the existence in the model (1.5) of an indefinitely 
increasing number of incidental parameters b; . 
The same inconsistency is found in linear regression analysis, 7.e., when we 
the restriction (1.1). The maximum likelihood estimate of = is then 
(v/N)S. When r tends to infinity, v/N — 1 — k™ and, therefore, since S is a 
consistent estimate of 2, it follows that the maximum likelihood estimate of = in 
linear regression converges also to (1 — k')=. Obviously, this happens also in 
the ordinary univariate analysis of block cies as was pointed out by J. Ney- 
man and E. L. Scott ((22], Example 2) in the case of a block design with the same 
treatment applied to all experimental units. 
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SEQUENTIAL x’?- AND T?-TESTS! 


By J. Epwarp Jackson? AND Ratpo A. Brapiey? 
Virginia Polytechnic Institute 
1. Summary. Consider a multivariate normal population with mean 
u = (u,°-**, up) and covariance matrix ¥. Let wo be a vector of constants, 


nx a vector of sample means based on n observations, and S, the corresponding 
sample covariance matrix. The statistics considered are 


(1.1) Xn = (nk — yo) E(x — wo)’ 
and 
(1.2) Tr = (xX — yo)Sn'(n® — yo)’. 
It is shown that probability-ratio tests for a sequential test of the composite 
hypothesis, 
(1.3) Ho: (vw — w)= (wu — wo)’ = % 
against the alternative 
(1.4) Hy: (uw — wo) "(we — wo)’ = Xi 
may be based on 
(1.5) pin/Pon = [exp — n(Ai — AG) /2] oF 1(p/2; nAix2/4) /oF1(p/2; ndbxe/4) 
when = is known and 
(1.6) pin/Pon = [exp — n (Al — Ab) /2] FP iln/2, p/2; nd373/2(n — 1+ T3))/ 
iF i[n/2, p/2; nT 3/2(n — 1 + T?)] 


when = is unknown and must be estimated from the sample. The sequential 
x’-test is associated with (1.5) and the sequential T’-test with (1.6). oF, and 
iF, are respectively forms of the generalized hypergeometric function 
pF’ 4(a1,°** ,@p3C1,°** , Cg 32), the second being the confluent hypergeometric 
function. 

It is shown that the use of these probability ratios in sequential tests results in 
Type I and Type II errors of approximately a and 8 when these values are used 
to obtain bounds on the probability ratios in the traditional way. It is also shown 
that the sequential tests terminate with probability unity. Bounds on the prob- 
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ability ratios are translated into bounds on x; and 7; themselves and tables 
have been prepared with more tables in preparation. 

Procedures are also given to test sequentially whether or not two samples 
come from populations with the same means. The x’-test is generalized to give 
simultaneous sequential tests on both the means and the covariance matrix. 

The average sample number functions (ASN functions) are considered and 
approximations to them suggested. The operating characteristic functions 
(OC functions) are difficult to investigate and essentially are only known ap- 
proximately at \j and \j . 


2. Introduction. Modern techniques of sequential analysis were largely inspired 
by the work of Wald, summarized in his book [27]. This work was motivated by 
the need to cut down on the amount of work necessary in the acceptance sam- 
pling of military supplies. 

Wald’s procedures are based on a probability-ratio test and were largely 
developed for the test of a simple hypothesis against a simple alternative. For 
composite hypotheses, Wald proposed a method of weight functions but the method 
is cumbersome and no method of insuring optimum weights is available. Goldberg, 
as reported by Wallis [28], and Nandi [19] proposed a method of frequency func- 
tions. This method is now generally used in considering composite hypotheses 
and will be used here. 

The method of frequency functions was used to develop the sequential t-test 
in the univariate case. This work was done independently by Rushton [24] [25] 
and Arnold [21]. We extend these methods to the multivariate problem to obtain 
the sequential 7°-test and also consider a sequential x’-test. Applications are of 
consequence for, in the inspection of complex items, a number of characteristics 
are measured. Observations on these characteristics are often correlated and 
univariate sequential methods applied to each characteristic lead to confusion. 

For the method of frequency functions, observations are successive values of 
the test-statistic and hence observations are no longer independent. Wald showed 
that the probability-ratio test could be used to obtain bounds on the test-statistic 
even though observations are not independent, but his work on termination, 
OC functions, and ASN functions no longer applies. Barnard [2] [3] and Cox [5] 
have independently established conditions under which the frequency function 
of a test statistic might be used in a sequential probability-ratio test and still 
guarantee approximately the risks a and 8. We quote Cox’s theorem for later use. 

THEoreM. “Let x = [21 , --- , Xn] be random variables whose probability density 
function (p.d.f.) depends on unknown parameters 6, ,--- , 0). The x; themselves 
may be vectors. Suppose that 

(i) t,--+: , tp are a functionally independent jointly sufficient set of estimators 
for 0, ---,0; 

(ii) the distribution of t, involves 6, but not 02, --- , Op; 


(ill) uy, --* , Um are functions of x functionally independent of each other and 
of h,--: ’ ty ; 
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(iv) there exists a set 8 of transformations of x = [m,---+, Znj into 
x* = [xf ,---, 2a] such that 

(a) tr, U1, °°* , Um are unchanged by all transformations in 8; 

(b) the transformation of tz , --- ,tp intot? , --+ ,t} defined by each transforma- 
tion in § is one-to-one; 

(c) if T2,---,T,and Ty ,--- , T} are two sets of values of tz ,--- , tp each 
having non-zero probability density under at least one of the distributions of x, 
then there exists a transformation in & such that if p = T2,---,t» = T;, then 
= T3,---,t =T%. 


Then the joint p.d.f. of th , us, +++ , Um factorizes into 
g(t | 6: )e( uw 9 °** Um ti), 


where g is the p.d.f. of t, and ¢ does not involve 6, .”” The proof of this theorem is 
given in Cox’s paper. 

The application of the theorem is straight-forward. The theorem permits fac- 
torization of the sample p.d.f. in such a way that the probability ratio attempted 
from the sample p.d.f.’s under null and alternative hypotheses reduces to the 
probability ratio for the test statistic under null and alternative hypotheses. 
The composite hypotheses have been reduced to simple hypotheses on a single 
parameter involved in the distributions of the test statistics. In repetition, 


(2.1) Pin/Pon = g(tin | 01)/ g(tin | 90) 


in the notation of the theorem and where 4, is the statistic 4; based on n ob- 
servations. The sequential test is as follows: 


(i) Accept Ho if pin/pon S 8/(1 — a). 
(ii) Accept Hy if pin/pon 2 (1 — B)/a. 
(iii) Continue sampling if 8/(1 — a) < pin/Pon < (1 — B)/a. 


Provided that the probability is one that the test terminates, the probabilities 
of error under the null and alternative hypotheses are approximately a and 8 
respectively (Wald [27], p. 43). 

We test Hy of (1.3) against H; of (1.4) using x2 in (1.1) or T3 in (1.2) de- 
pending on whether = is known or not. For the sequential x’-test, the probability 
ratio is the ratio of two non-central x’-densities as shown in (1.5). For the se- 
quential 7*-test, the probability is the ratio of two non-central T°-densities as 
shown in (1.6). In both (1.5) and (1.6) some simple combinations of terms 
have been made to obtain the forms shown. 


3. Fulfillment of the conditions of Cox’s Theorem. Verification of the conditions 
of Cox’s Theorem has not been included by other authors using the theorem. 
Since we have not found the necessary verifications trivial either for this paper 
or others already published, we include a sketch of the required demonstrations. 

Le* x be the vector (x, ,--- , X,) where x; is the ith observation vector on 
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p multivariate normal variates. Then x consists of n independent, equally dis- 
tributed, multivariate normal observation vectors. The vector of variate means 
is yw and the dispersion matrix is © assumed known for the x’-test and unknown 
for the T°-test. The sample covariance matrix corresponding to = is S, . We may 
let x; = x; — wo and x = (x, --- ,X,) and regard x as the original observation 
matrix of the Cox Theorem. Note that S, is invariant under such changes in 
location. 

ConplITION (i). A vector of sufficient statistics for the elements of y is ,X and 
for u = wu — wo is »x; the elements of S, and ,xX are sufficient for w and &. (Cf. 
Anderson [1], Sec. 3.3.3.) Define GG’ = =~‘ and EE’ = S,,’. We transform so that 


(3.1) ay’ = n'A'(,% — wo)’ = nia’ x’ 
and 
(3.2) n’ = G'(u — wo)’ = Gy’ 


with A = G for the x’-test and A = E for the T’-test. A spherical transformation 
is used to transform ,y to x2 0s T2 , din, *** ,@p-in, X2 087, = 0,0 S au S 7, 
i= 1,---,(p — 2),0 S a,4,. S 2x. A similar transformation transforms 
ntod = (u— wo)= (uw — wo)’,a, °°: , a1,’ 20,050; 5 7,1 =1,-:-, 
(p — 2),0 S ap. S 2x. The transformations on ,X and y and on ,x and p are 
one-to-one and it follows that x2, ain, --- , @p-1.» are a set of sufficient statistics 
for X”, a1, °°* , G1 and Ts, din, *** , @pim, Sn for dX, a1, °°: , api, =. We 
associate the sets of statistics with ¢,,--- , t, in the theorem and the sets of 
parameters with @, , --- , 6, of the theorem; condition (i) follows. 

ConpiITIon (ii). It was shown by Fisher [8] that the marginal distribution of 
x; involves only \” and Hsu [14] and Bose and Roy [4] have the result for T3 . 
Condition (ii) is met when 2’ is associated with 6, andy; or T? with ¢, . 

ConplITION (iii). To consider the third condition of the theorem, we associate 
Uh, °°» Um With xi, °°*,x20r yi ,:::, T2_,.It is necessary to show that 
the statistics in the sets are functionally independent of each other and of x3, 
i i **'5 @ aye Te ee * ste, & 

Condition (iii) seems intuitively obvious. Formal proof depends on a series of 
transformations. We rearrange the elements of x to yield a p by n matrix X’ 
with (i, 7) element x;; = 2; — wio,t = 1,°-:,p,j = 1,-°++,n. Let 


(3.3) Y = MX 
where M is the non-singular n-square matrix with jth row given by 


it, --- 77,0, «+ , 0), 


the nu nber of non-zero elements being j7. The rows of Y now depend on the 
sample means of the p variates, the jth row on the means of the first 7 observa- 
tions,7 = 1, --- , n. At the next stage we transform Y; , the jth row of Y, so that 


(3.4) Z; = Y;A; 
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where A; = G for the x’-test, j = 1,--- ,n, and A; = E;, E,E; = §;,j = 
p + 1,---,™n, for the 7T°-test; S; is the sample dispersion matrix similar to S, 
but based on the first 7 observation vectors. A spherical transformation is now 
applied to each Z; . For Z;, we obtain new variables xj, aj, --* , @p1,j With 
= Daas; = 7 REX rT) ,a1;,-°+, Gp; with T = 2,24; = 
j XS; ;X’. For the x’-test, through the series of non-singular, one-to-one trans- 
formations sketched here, x or x has been transformed to a set of new variables 
such that xi, --- , x-1 are functionally independent of each other and of x:, 
Qin, *** » @p-1.- Similarly, for the T’-test, each successive 7% is a function of 
one more row of X and T% , din, *** , @p-tn, Sn all depend on all rows of X. Con- 
dition (iii) follows from these considerations. 

ConpDITION (iv). We rewrite Condition (iv) in terms of the present problems: 

There exists a set of transformations $ of X into X* such that: 

(a) xi,°'+,xn (or Te41,°*:, T2) are unchanged by all transformations in 8. 

(b) The transformation of a;,,--:, @p4.. (and S, for the T’-test) into 
Qin, *** , @p-1,n (and S*) defined by each transformation in $ is one-to-one. 

(c) If Air, ---, Apa, (and S,) and Af, --- , A¥_,, (and S*) are two sets 
of values of a, , --* , @p4,, (and S,,), each having non-zero probability density 
under at least one of the distributions of X, there exists a transformation in § 
such that, if ain = Ai, -°*: , pin = Apr, (Sn = Sn), then a}, = AT,:: 
ie = ants (S, = S*). 


The necessary classes of transformations are 


(3.5) X* = XGBG" or X* = XEBC’ 


’ 


respectively for the x’- and T?-tests where B is any p by p orthogonal matrix and 
C is any non-singular, triangular, p by p matrix. 

We first consider the x’-test. From (3.3) and (3.4), Z = MXG and ZB = 
MXGB = MX*G = Z*. The transformation from Z to Z* is orthogonal. Parts 
(a) and (b) of Condition (iv) follow at once. The sums of squares of elements 
in rows of Z equal the corresponding sums of squares for Z* and are xj , °-- , x2 - 
The transformation of Z,, into Z* is one-to-one for each B and, since x , din, **- , 
ay-1.» follow from a spherical transformation on the elements of Z, and the 
corresponding transformation applies to Z*% , we have a one-to-one transforma- 
tion of din, ***, Gpin tO Qin, **:, re 

If A,,--:, Ap4and Ay, --- , A}_, are two sets of values of ain, -** , Gp—i.n 
suitably restricted between 0 and x or 0 and 27 as the case may be, then Z, 
and Z* may be evaluated except for the scalar x, which is the same in both 
cases. If these specified values yield Z, and Z* , they are related by Zt = Z,L 
and this equation defines (p — 1) independent equations on the elements of B. 
There are also p(p + 1)/2 additional equations on the elements of B imposed 
through the requirement that B be orthogonal. The solution for the p’ elements 
of B is not unique (except for p = 2) but matrices B satisfying the requirements 
may be found and this is sufficient for (c) of Condition (iv). 

For the T7*-test, Z*/j} = ,x* = ,EBC’ and S* = CB’E’S,EBC’, 
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j=pt+l,---,n. Note that S% = CC’ because E’S,.E = I. Part (a) of Condition 
(iv) follows since 


Tj? = j &*S}* &*” = j REBC'C’'B’E'S;'E"'BC'CB’E; jz’ 
= j a8;' # = Tj, j=pt+i,---,n. 


From (3.3), (3.4) and (3.5) it follows that Z% = Z,B. Given B and C, the trans- 
formations of S, into S* and Z, into Z* are one-to-one and the transformation of 
Z, into Z* actually transforms T%, , din, --* , @p-1.n into T%", af, , --* , Gan 
with T°, = T*’. Part (b) is thus verified. In regard to Part (c), the existence of 
an appropriate B is demonstrated exactly as for the x’-test and C is defined by 
S* and E by S, . Hence the required transformation in § exists and Part (c) also 
follows for the T°-test. 

All of the conditions of the theorem have been fulfilled for both the x’- and 
the T’-tests. Hence the joint p.d.f.’s of xi, --- , xn Or Fee ,-::, T., factor into 
g(xn | Mr )E(xd ++, xn) or g(T. | mr )C(To4,, °°: , Tr) and pin/pon can be 
written as g(x. | mdji)/g(x. | mdb) or g(T%, | ndj)/g(T%, | nd). 

The first of these is the ratio of two non-central x’- densities with p degrees 
of freedom and non-centrality parameters ndj and np reducing to (1.5); the 
second is the ratio of two non-central 7°-densities with degrees of freedom p 
and non-centrality parameters n\j and ndj reducing to (1.6). 

In many situations \5 = 0 and then (1.5) reduces to 


a 2 = 9 9 
(3.6) Pin/Pon = € amr; oF '»1(p/2; ndix2/4) 
while (1.6) becomes 
(3.7) Pin/pon = €™! sFiln/2, p/2; nX2T2/2(n — 1 + T2)). 


Furthermore, if p = 1, (3.6) reduces to 


n 


(3.8) Pin/Pon = € it cosh [ry 7 (x; — pwo)/e] 


j=l 


and (3.7) to 


(3.9) Pin/Pon = €*™ ,F,[n/2, 3; ndil'/2(n — 1 + #)). 


Equation (3.8) is equation 9.5, page 135, in Wald’s book; equation (3.9) is 
equation 5 given by Rushton [25] for the univariate sequential t-test. 

A referee and W. J. Hall have pointed out that Cox’s Theorem may be by- 
passed through use of an unpublished theorem of Stein and the fact that x‘, 
and 7%, are maximal invariants of sufficient statistics under certain groups of 
linear transformations implying that they are sufficient for any invariant sta- 
tistics. This means that x’, is sufficient for (xi , --- , x.) and 7%, is sufficient for 
(Ty41,°+:, Tn) and the factorizations resulting from Cox’s Theorem are im- 
mediate. The reader is referred to Hall [12]. We have not relied on this approach 
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for reasons set forth at the beginning of this section and because the transforma- 
tions involved are of interest in themselves. 


4. Termination proofs. Let x; and x, be the boundary values for x‘, corre- 
sponding to pin/Ppon = B/(1 — a@) and pin/Ppon = (1 — B)/a where pin/Pon is 
defined in (1.5). 

We assert that 


P(Sample Size > n) S P(xn < xn < Xn) = Pr 


and proof of termination follows if we show that lim,.. P, = 0. This approach 
is similar to that given by Ray [23] who considered sequential analysis of variance. 
Set U*, = x;,/n and consider the corresponding limits U%, and U%, . Erdelyi 


et al [7] shows that 
oFi(c; x) = € °¥* ,F,[(2e — 1)/2, 2e — 1; 42] 
and, when this is applied in (1.5) and x3, replaced by nU%, , we have 
Din/Pon = fa( Us) = [exp [{ —n(ai — 03)/2} — (n?riU%)! 
Fil(p — 1)/2, p — 1; 2(m*ASU5) YT 
Intersections of the family of curves y = gn,(U*) = In f,(U*) and of 


y = In[8/(1 — a)| and y = In [(1 — 8)/a] determine U%, and U%, respectively. 
It can be shown that 


+ (n’x3U2,)'] iF\[(p i 1)/2, p - 1; 2(n*AjU%.)*) 
\ 0 n > 


gn(U’) > 0 for U’ > 0 
and 
lima. gn(U*) = @. 
It may be demonstrated also that 
gn(U*) = nlf — (Ai — 05) /2} + (U*)*[(ai)* — (A0)*] + O(1/n)}. 
These results are straightforward, being based on the proposition that 
iF\(a,c;x) = ['(c)/T(a) ex” “(1 + O(\z\")] 


given by Erdelyi. It follows that y = g,(U*) defines a family of curves starting 
at y = —n(dji — Aj)/2 for U? = 0 and increasing strictly to +0 as U? > ~. 
Hence g,(U’) = 0 has one root and it is 


Us = [Ai + 2(aiN5)? + Al/4 + O(1/n). 


The intersection of the horizontal line y = In [8/(1 — «)] with y = g,(U’) 
occurs where 


—[(Ai — 03) /2] + (U?)*{(ay)? — (a5)4} + O(1/n) = nn [8/(1 — )] 
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and hence 
U%, = [Ai + 2(aind)? + AS1/4 + O(1/n) = US + O(1/n). 
Similarly, 
Ui, = Up + O(1/n). 


Consider U’,. This is a random variable that converges stochastically to 
” = (wu — wo)= (wu — wo)’ asn— o~. If \” ¥ UZ, the sequential process ter- 
minates with probability 1. If XY = U , more powerful methods are required. 

The work of David and Kruskal [6] suggested the following argument. Let 
P, be the probability of termination of the sequential test. 


Pr2=1— P(x, Sx SG) S11 — (Ke — xn) Spey? x A(X’5 p, md’) 


where h(x’; p, nd’) is the noncentral chi-square density with p degrees of freedom 
and parameter of noncentrality n\”. We show below that limn.. (Xx, — Xn) is 
finite and lim,... SUP(,2,x2) h(x’; p, nd”) = 0 as may be shown by examination 
of that density. Hence Pr = lim,.. [1 — P(x; S x = x,.)] = landP, = 1. It 
remains to show that limn.. (X, — Xn) is finite. 

We use U’,, U%, and U%, and recall that U%, and U%, are such that g,(U’,) = 
In [8/(1 — a@)] and g,(U%,) = In [a/(1 — 8)]. Consider g,(U%,) in the interval 
(U%, , U’.). We apply the law of the mean in this interval and divide both sides 
by n to obtain 


| (? ee ee *) . 
(4.1) "\ Be J _ gl, +0(0% — Us) 


n(0, — U2) n 


where0 5S 65 1 
asco") = 4 (MA) phlle + Wa wiacwrior _ ,) 
™ *\ U0? iF i{(p — 1)/2, p — 1; 2(n*AZU?)}] 
. ear Fil(p + 1)/2, p; 2(n’AU*)'] y 


Fi((p — 1)/2, p — 1; 2(n*A30*)}] 


— 3 U 


iF ,(a, c; 2) [T(c)/T(a)]ex* “(1 + O(|z|*)] and 
gi(U*) _,[ (mM) _ | (:) 
oe | (z) (hr + ONn} 


ae U? a = = 


9 


and in the neighborhood of Uj [in the interval (U%, , U’.)], 


n 





SEQUENTIAL x?- AND T?-TESTS 


gn(U*) _ (ai)* — (rs)! 
nm (ADE + 8)! 
and limy+.« [gn(U*)/n] > 0 in the neighborhood when dj > Xj as required. Re- 
turning to (4.1), we find that the right-hand side has a finite limit and conse- 
quently n(U%, — U2.) = x — Xn has a finite limit. 
The termination proof for the sequential x’-test is now complete. 
It is well known that the non-central T’-distribution approaches the non- 
central x’-distribution asymptotically with n. Then the argument above applies. 


T.. and 7T,, the boundary values for the Sequential T*-test, approach 
nvr + 2(A3a3)? + 31/4 also. 


+ O(1/n) 


5. Two-sample cases. The sequential techniques discussed can also be used 
for two-sample tests with paired observation vectors. Let the first population 
have mean vector y” and dispersion matrix &,, , the second y” and Za. . Suppose 
further that the cross-covariance matrix is XZ». Let wp = yw — yp” and x; = 
x? — xf, i = 1, --- , nm where xS” and x!” are respectively the ith observation 
vectors for populations 1 and 2. The dispersion matrix of x is £y, + Z2 — Le — 
Zi2 . Now when ; , Ee and EY» are known, the two sample problem is reduced 
to an application of the Sequential x’-test. 

When the variance-covariance matrices are not known and must be estimated, 


the situation is even simpler. Again we define y = yp” — py” and use variates 
x =x — x” ¥), + Ye — Zp — Lp is estimated directly from the observation 
vectors x; = x!” — x! ,i = 1, --- , mand this problem is reduced to that handled 


by the Sequential 7°-test. 


6. The ASN functions. In the planning of sequential experiments information 
on the expected sample sizes (S) is desirable. Wald [27] established approximate 
procedures for determining the ASN function when sequential observations are 
independent. These procedures have not been demonstrated to be valid when 
successive values, not independent, of a test statistic are considered. 

Johnson [18] circumvented the problem by sampling additional groups rather 
than additional items within a group and hence considered successive inde- 
pendent values of a test statistic. This could be done for the Sequential ’- and 
T’-tests but usually would require too many observations. 

Rushton [24, 25], in discussing the univariate sequential t-test, reduced the 
problem to a one-parameter problem by assuming that, after a number of ob- 
servations, the variance was known. This led to lower bounds on the ASN func- 
tion. This procedure does not seem applicable in the multivariate case because, 
even if = is assumed known, we are still dealing with composite hypotheses. 

A third approach is the Monte Carlo technique. This has been used for the 
univariate sequential t-test and Freund and Appleby [9] have studied our tests. 

A fourth method may be attempted. The method is based on the fact that, 
if one ignores excesses over the boundaries at the termination of a sequential test, 


(6.1) — &{ln (pin/Pon)] = (1 — a@)in [8/(1 — a@)] + aln[(1 — 8)/a} 
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where H, is true and 
(6.2) &[ln (pin/pon)| = Bln [8/(1 — a)] + (1 — B)in[(1 — B)/a] 


where H is true. Now, in general, In (pin/Pon) will depend on n and a test statistic 
T,, based on the first n observations. What is needed is to express &[In (pin/Pon)] 
as a function of &(n), the ASN number, (and of the parameters involved) and 
to then solve (6.1) and (6.2) for &(n) and &:(n), the required ASN numbers 
under Hy and H, respectively. But a way of doing this has not been found for 
any sequential tests of composite hypotheses so far as can be ascertained by the 
authors. Bhate, in unpublished work, proposed approximating to &[ln (pin/Pon)] 
by replacing 7’, and n in In (pin/pon) by &[T, |n = &(n)], the fixed sample-size 
expectation of T,, given that n = &(n), and &(n) respectively. The expectation 
&[T,, | = &(n)] is obtained under Hp for (6.1) and under H, for (6.2). This 
procedure is seen intuitively to give a “central value’ for the distribution of 
In (Pin/Pon) and, upon appropriate substitutions in (6.1) and (6.2), to give 
equations in &)(n) and &(n), values of &(m) under Ho and H, respectively, for 
solution. The method, crude though it may appear, has been used in a number 
of situations, for example, by Ray [22] for sequential analysis of variance and 
most recently by Hajnal [11] for a two-sample sequential t-test. 

We have tried the method for the sequential x’- and T°-tests of this paper. 
Since &[x*, |n = &(n)] = (p + md’) | neon) and, using pin/pon in (1.5), we ob- 
tained the equations 


—4n(Ai — Ao) + In oF ilp/2; nAi(p + nro) /4] 
(6.3) — In oF s[p/2; ndo(p + nd35)/4] 


(1 — a)ln [8/(1 — a@)] + aln [(1 — B)/a] 
and 


—4n(Ai — Ab) + In oF i[p/2; ndi(p + ndj)/4] 
(6.4) — In oF i[p/2; ndo(p + ndi)/4] 
= Bln [8/(1 — a)] + (1 — B)In[(1 — B)/a]. 


For brevity, these equations have been written in terms of n but solution of 
(6.3) for n yields &(n) and of (6.4), &:(n), both for the x’-test. For the T’-test, 
let c = T%,/2(n — 1 + T®.) and note that 


&[x |n = &(n)] = (nd’/2){1 — [exp (—nd’/2)][(n — p)/n} 
-1F,[n/2, (nm + 2)/2; nd*/2]} |n—ccm) 


following Wishart [29]. With pin/pon in (1.6), the equations for the T*-test corre- 
sponding with (6.3) and (6.4) are 


—4n(Ai — AS) + In iF [n/2, p/2; ndjEo(x)] — In Fi[n/2, p/2;nd¢60(x)] 


(6.5) 3 
= (1 — a)In[8/(1 — a)] + aln[(1 — 8)/a] 
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and 


(6.6) —4n(Ai — Ao) + In iF i[n/2, p/2; ndjE1(x)] — In FP i[n/2, p/2; ndsEi(z)] 


= Bln [8/(1 — a)] + (1 — B)In[(1 — 8)/a] 


where &(xr) = &[x|n = &(n)]|,2.2 and & (2) = &[r|n = &(n)]|,222 . 

Again we have left (6.5) and (6.6) in terms of n for simplicity but note that 
the solution of (6.5) for n yields &(n) and of (6.6), &(n). 

Solutions of (6.3) and (6.4) or of (6.5) and (6.6) do seem to provide the 
desired guides on the numbers of experimental units required for the planning of 
sequential experimentation. Solution of these equations can be accomplished 
iteratively with a high-speed computer and, since applications are likely to be 
repetitive, this need only be done initially in setting up an experimental or 
control program. 

The principal justification for the Bhate method of approximating ASN 
numbers is that results agree sufficiently well with Monte Carlo studies for prac- 
tical purposes. For verification of this, the reader is referred to the paper by 
Freund and Appleby. One other study, conducted by K. J. Arnold (Natl. Bur. 
Stds. [21]), is available for the sequential t-test with p = 1, § = 0, ij = 1.0 
and a = 8 = .05. In that study 500 sets of observations were sampled for the 
two values of \’; the average sample size to reach a decision under Ho was 10.0 
while the conjectural value was 10.7 and under H, was 11.2 compared to 9.7. 
It is interesting to note that the actual a- and 8-values from this study were 
044 and .034 respectively, somewhat different from the intended values. Ray 
[22] used this second example also but a rounding error occurs which makes his 
conjectured values appear to be closer to Arnold’s results than they really are. 


7. Generalized x?- and 7°-statistics. In addition to the x?- and T?-statistics 
already discussed, there are two others in each case which deserve mention and 
complete the families of x’- and T’-statistics (Hotelling [13]). We adopt 
Hotelling’s notation and also drop the subscript n used on x’, T°, S and x. The 
x’-test so far considered is xz in Hotelling’s notation and = becomes XY». Now 


> = (uw — wo) Zo '(u — wo)’ becomes \% and the sequential x*-test is for the 
hypotheses, 


Ho : at = Xue ; 
(7.1) F ; 
H, ° Au = Am, . 
The second statistic to be considered is 
(7.2) xp = (n — 1) trS=" 
for hypotheses 


73) Hy: & = X» (orAp = tr EX = As, ), 
7.3 
Hi: => YX» (ordrAs = Ap, > Ady)- 
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T 2 = : 2. . ° ° 2 
Usually we shall want \3, = p but this is not essential. x is distributed like x’ 
J 0 I 
with (n —1)p degrees of freedom and noncentrality parameter (n — 1)Ap. 
xv and x3 are multivariate extensions of univariate tests based on ¥ and s. 
The sum x5 = xv + xp is a measure of the overall variation of the sample 
from standard. x’, is distributed Jike x” with np degrees of freedom and param- 
p 
. 2 2 ‘ a 
eter of noncentrality 2, = nd + (nm — 1)A3,. An alternative form for x3 is 


(7.4) xo = 2d xi 


where x; = (xX; — wo) Zo (x; — wo)’, i = 1, --- , m. xp could be obtained by sub- 
traction, x2 — x , and S need not be computed. Logical hypotheses for use with 


xo appear to be 

, Ho: ndo = ndu, + (n — 1)Ad, = NAO, 

(7.5 3 2 2 
Hy: no = nXu, + (n — 1)Ab, = NX, - 


Sequential tests may be developed for (7.3) and (7.5) based on x5 and x3. 
The probability-ratio statistics are respectively 


Pr ‘Do a e045 2 Fil(n — 1)p 2; (n ae 1) Nb. XD 4| 
oF il(n — 1) p/2; (n — 1)db, xd/4) 


Din/'Don = er@bi-Mooi2 oF ilnp/2; no, xo/4) 
oF i[np/2; no, xo/4!) 

These sequential x’-tests are developed just as the one based on xj, and would 

depend on the same set of tables for values of x’ and x’ except that often for xx 

we would have \y, = 0 and here tables are required for cases where neither null 

nor non-null values of \” are zero. 

If the family of sequential x’-test were used, say, in sampling inspection, the 
inspector could ascertain after each item inspected 

(i) whether or not the sample means differed significantly from standard, 

(ii) whether or not the variation about sample means was greater than the 
preassigned Xp, and 

(iii) whether or not the overall variability of the sample is larger than should 
have been expected. 

Generalizations of the sequential T°-test are not directly available; generali- 
zations of the non-sequential T’-test were developed. T° in this paper corresponds 
to Tx in Hotelling’s notation. T} of Hotelling generally represents the varia- 
bility in a subgroup of an experiment compared to, say, the average subgroup 
variability of an experiment. Rarely would such situations occur in sequential 
experimentation. Somewhat more conceivable is the situation where a sequential 
T x test is run in parallel with a x3-test, the test on variances is based on previous 
experience but the test on means depends only on the variability of the sample 
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itself. T%, and 73 are useful statistics in the multivariate analysis of variance 
and could perhaps be used in sequential multivariate schemes when more is 
known about the forms of their distributions. Sequential tests for the roots of 


determinantal equations might also prove useful and feasible but computational 
procedures would be difficult. 


8. Discussion. We now discuss some problems that arise in using the sequential 
methods developed in this paper. 

(i) Tables. Direct applications of our sequential procedures involve compari- 
son of the probability ratio at each stage with 8/(1 — a) and (1 — 8)/a. This 
is laborious and requires evaluation of either oF;(c; 2) or ,F,(a, c; x) after each 
observation. Tables of both functions are available (Jackson [15], Nath [20], 
Rushton and Lang [26]) but Lagrangian interpolation of the logarithms of these 
functions is still necessary in most cases. It is better to prepare tables of the 
boundary values x’, and x’, and 7%, and 7%, so that only the test statistic is com- 
puted in applications. Tables now completed for a = 6 = .05 are given by 
Jackson and Bradley [16] and show x’, , Xn , Z’ and T;, for p = 2 (1) 9; = 0; 
Ai = .5, 1.0, 2.0; maximum 7: 60 for Aj = .5, 45 for A} = 1.0, 30 for Aj = 2.0. 
R. J. Freund with Jackson at the Virginia Polytechnic Institute has completed 
some additional tables and a report [10] has been prepared. Publication of a 
separate volume of tables is contemplated when this work is complete. 

(ii) Determination of Hy and H, . Specifications of values of the noncentrality 
parameter )” lead to difficult administrative decisions. For sequential tests for 
means we would often take \s = 0 corresponding to wy = wo. Determination of 
Ai is much more difficult in the multivariate case than for the univariate case 
since a p-dimensional ellipsoid related to problem specifications must be 
visualized. No single rule on specifying \j can be given and each problem has 
to be handled individually. Jackson and Bradley give some examples in connec- 
tion with the sampling inspection of ballistic missiles and a paper showing these 
applications has been accepted for publication [17]. 

Sequential procedures should also be extended to cover the use of one-sided 
tolerances and essentially generalize the work of Goldberg (Wallis [28]). 

(iii) OC and ASN functions and truncation. No explicit or even approximate 
expressions yet exist for the OC and ASN functions when the hypotheses under 
consideration are composite. Until such time as these expressions can be found, 
we must rely on Monte Carlo evaluations for a description of these properties. 
Little or no work has yet been done regarding truncation of sequential tests of 
composite hypotheses. Again, until such expressions are available, we must rely 
on Monte Carlo studies to show us the effect of truncation on the OC and ASN 
functions. 

(iv) Grouping. These techniques were originally designed for the sampling 
of ballistic missiles, items which involve considerable expense. However, for a 
low-cost, high-volume process, sequential sampling by groups might be preferable 
to item-by-item sampling. Except for a few isolated cases like the binomial, no 
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optimum procedures have been worked out for sequential sampling by groups. 
The general procedure recommended by Wald, in our case, would be to take 
groups of say m observations per group and compare the resultant x%, or 7%, , 
as the case may be, with the corresponding x» and x2, or 7’, and T? where n is 
now equal to m, 2m, 3m, --- , etc. The effect of this procedure is to increase 
the average sample number and to decrease the size of a and 8. Except for em- 
pirical studies, the magnitudes of these changes are unknown but the directions 
of the changes are such that they compensate for each other to some extent. 
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ESTIMATING THE PARAMETERS OF NEGATIVE EXPONENTIAL 
POPULATIONS FROM ONE OR TWO ORDER STATISTICS 


By H. Leon Harter 
Aeronautical Research Laboratory, Wright-Patterson Air Force Base 


0. Summary. This paper discusses the use of order statistics in estimating the 
parameters of (negative) exponential populations. For the one-parameter 
exponential population, the best linear unbiased estimators, ¢ = cir, and 
Tim = Citi + Cm%m, Of the parameter o are given, based on one order statistic 
2, and on two order statistics x; and z,,. For samples of any size up through 
n = 100, a table is given of k, 1, and m and of the coefficients c, , c:, and cm , 
together with the coefficients of o° in the variances V; and Vi, of the estimators, 
and the corresponding efficiencies E;, and E;» (relative to the best linear unbiased 
estimator based on all order statistics). For the two-parameter exponential 
population, the best linear unbiased estimators, @ = Cai®i + Cam€m, = Coit + 
Comim, and f = Cyitt + Cyum%m, Of the parameters a and o and the mean up = 
a + o are given, based on two order statistics x; and z,,. For samples of any 
size up through n = 100, a table is given of m(/ is always 1 for the best estimator) 
and of factors c, and c, for computing the coefficients ca1 = 1 + Ca,Cam = —Ca, 
Cot = —Ce, Com = Co, Cyt = 1 + Ca — Ce, ANd Cum = Co — Ca, together with 
the coefficients of o° in the variances Vz, V;, and V; of the estimators, and 
the corresponding efficiencies Ez , E; , and E; (relative to the best linear un- 
biased estimators based on all order statistics). 


1. Introduction. Since the publication, in 1946 and 1948 respectively, of papers 
by Mosteller [4] and by Wilks [10], a great deal of attention has been given to 
the use of order statistics in various statistical procedures, including the estima- 
tion of the parameters of various populations. Among the first to use this method 
for exponential populations was Halperin [2] in 1952. Since that time, Epstein 
and Sobel [1], Sarhan [5], [6], Sarhan and Greenberg [7], [8], and Sarhan, Green- 
berg, and Ogawa [9] have discussed various aspects of the use of order statistics 
in the estimation of the parameters of exponential populations. Most of these 
authors have considered best linear unbiased estimators based on all order 
statistics, or, in the case of truncated or censored samples, on all available order 
statistics. The last of the above papers includes simplified estimators based on 
two order statistics, with tables for samples of any size up through n = 20. 
The present paper contains more accurate tables for samples of any size up 
through n = 100, not only for estimators based on two order statistics, but also, 
in the case of the one-parameter exponential population, for estimators based 
on one order statistic. 

In a previous paper, the author [3] studied estimators of the standard deviation 
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of normal, rectangular, and one-parameter exponential populations based on 
sample ranges and quasi-ranges. The results were quite satisfactory for the 
symmetric populations, but not for the one-parameter exponential, for which 
it was found that estimators based on a single order statistic are more efficient. 
That discovery led to the further research reported in this paper. 


2. Estimators of o for the One-Parameter Exponential Population. 

2.1. Estimators based on one order statistic. For the one-parameter exponential 
population with parameter o, which is both the mean and the standard deviation 
of the population, the probability density function fi(2) is equal to (1/c)exp 
(—2x/c) for 0 S x < «, and zero elsewhere. The sample mean #, which has 
variance o’/n, is the minimum variance unbiased estimator, and also the maxi- 
mum likelihood estimator, of the parameter o. The expected value and the 
variance of the kth order statistic, 2; , of a sample of size n from this population 
are given (see Epstein and Sobel [1]) by 


k 
(1) E(a.) = o>, a; 
1 


k 
(2) var % = Js. ai, 
1 


where a; = 1/(nm — 7 + 1). An unbiased estimator of the parameter oc, based 
on the order statistic x, , is given by 6 = cya, , where 


» 
(3) a=1/ da. 
1 


The variance of this estimator is given by 


k k 2 
, 2 2 
(4) Vi=o> aj (da), 
1 1 

and its efficiency (relative to the minimum variance unbiased estimator Z) is 
» - see . % 2 ‘ 
E, = var £/V; , where, as mentioned above, var € = o /n. Thus the relative 
efficiency E, is given by 


(5) ts (x a) /(n > a’). 


The best estimator of c, based on one order statistic z; , is the one for that value 
of k which minimizes V; (maximizes E,). The author is not aware of any analyti- 
cal method for determining the value of k which yields the best estimator of o; 
hence, for each value of n, Vi; was computed for k = 1(1)n. When the best value 
of k for a given n had been found, the corresponding c, and FE, were also com- 
puted. The computations were performed on the IBM 1620 computer. Table 1 
gives, form = 1(1)100, the value of k for the best estimator of o, the coefficient 
c, (to 6 significant figures), the coefficient, V;./o’, of ao’ in the variance V; of the 
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estimator (to 7 significant figures or 6 decimal places, whichever is less accurate), 
and the relative efficiency F, (to 5 significant figures). The tabular values of 
c. , Vi/o', and E, are accurate to within a unit in the last place given. 

2.2. Estimators based on two order statistics. Unbiased linear estimators of the 
parameter co, based on two order statistics x; and z,, , are given by Gim = citi + 
Cmim , Where c:E (21) + Cn (rtm) = o, with E(2,) and E(z,,) given by equation 
(1), if k takes the values / and m. The variance of such an estimator is given by 


. r 2 2 
(6) Vim = Cy Var ti + Cm Var Lm + 2ciCm COV (21, Tm), 


where var x; and var x», are given by equation (2), if k takes the values / and m, 
and cov(z:, 2m) is given (see Sarhan [5]) by 
l 


= 2 2 
(7) COV (111, %m) = o 7. a; = varz., (lL < m). 
1 


It can be shown that, for given values of / and m, the values of c; and c,, which 
yie:d the unbiased estimator ¢;,, with minimum variance are 


l m 
(8) a=1/(Sa+r»da) 
1 1 


and 
(9) 


where 
(10) 


By substituting from equations (2) and (7)—(9) into equation (6), one finds 
that the minimum variance of @:,, , for given 1 and m, is 


l m 1 m 2 
(11) Vim = o| (I +n) Dab +x Dai] /(Natrba), 
1 1 1 1 


The efficiency of this estimator (relative to the minimum variance unbiased 
e a ’ ss » a 2; - 

estimator Z) is E1, = var €/Vim , where, as before, var = o /n. Thus the rela- 

tive efficiency E;,, is given by 


l - ° 1 
3). Bn (da + sd a) /»{a + 2h) Do ai + Y Ea]. 
1 1 I i 


The best estimator of co, based on two order statistics x; and x» , is the one for 
those values of / and m which minimize V;,, (maximize E;,,). The author is not 
aware of any analytical method for determining these values of | and m; hence, 
for each value of n, Vin was computed for 1 = 1(1) (n — 1) andm = (1+ 1) 
(1)n. When the best values of / and m for a given n had been found, the corre- 
sponding ¢; ,c» , and E;,, were also computed. The computations were performed 
on the IBM 1620 computer. Table 1 gives, for n = 2(1)100, the values of / and 
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m for the best estimator of o, the coefficients c; and c,, (to 6 significant figures) , 
the coefficient, Vin/o, of o in the variance V;,, of the estimator (to 7 decimal 
places), and the relative efficiency E;,, (to 5 significant figures). The tabular 
values of c:,¢m, Vim/o, and E;, are accurate to within a unit in the last place 
given. 

3. Estimators or Parameters a, o and Mean uz for the Two-Param- 
eter Exponential Population from Two Order Statistics. For the two- 
parameter exponential population with parameters a and o, the probability 
density function fo(z) is (1/a)exp[—(2 — a)/o] fora S x < ~, and zero else- 
where. The mean of this population, which will be denoted by uy, is equal to 
a + o. For a sample of size n from this population, the expected value of the 
kth order statistic, x, , exceeds by a the value given in equation (1) for the one- 
parameter exponential population, and thus is given by 


k 
(13) E(u) =a+o>,aq;. 
1 


The variance of xz, and the covariance of x; and z,, are the same as for the one- 
parameter exponential population, and hence are given by equations (2) and 
(7), respectively. 

Unbiased linear estimators of the parameters a and o and the mean uw may be 


obtained from any two order statistics x; and z,, . These estimators are of the 
form 


(14) = Calli + Camtm , 
(15) = CoiXt + ComEm y 
and 

(16) A = Cutt + Conta - 


It has been shown (see Sarhan, Greenberg, and Ogawa [9]) that, for given / and 
m, the coefficients in the best linear estimators based on two order statistics 
x, and z,, are given by 


(17) = 1+ Ca,Cam = 

(18) = —Ce , Com = Co, 

and 

(19) Cut = 1 + Ca — Co, Cum = Coe — Ca; 
where 

(20) 

and 


(21) 
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The variance of the estimator ¢ is given by 


(22) Vi=o a /( 
i+1 


and the variances of the estimators @ and 4 are given by 


l 1 2 
o >ai+ (> a) Vs 
1 1 


(23) 


and 


Vi 
(24) - l , 43% 2m A m 2 
=¢[Dai+(La-1) Pal (3 «) |. 
1 1 +1 +1 


The best linear unbiased estimators of a, ¢, and uw based on all order statistics 
(see Sarhan and Greenberg [7]) are 


| —1)m —- Sx] / inn — 1), 
b r,—-(n- va] / — 1), 


These are also the maximum likelihood estimators. Their variances are 
(28) Va = o /[n(n — 1)], 

(29) Vi=o/(n—1), 

and 

(30) Va = a’ /n. 


The efficiencies of the estimators &, , and g (relative to the best linear unbiased 
estimators &, é, and 4 based on all order statistics) are given by 


(> a) [n(n — 1) 


(31) 


2™ 
a ‘ 
a; a; , 
] +1 
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and 


m 2 l m l 2m 
(33) Ex = V3/Vi = (= a) /n [> a; (> a) + (x ai ) > ai]. 
141 1 t+1 1 t+1 


The best estimators of a, ¢, and yw, based on two order statistics 2; and z,, , are 
those for the values of / and m which minimize Vz, V;, and V; (maximize 
Ez , E; , and E;). It can be shown that, for a fixed value of m, the variances of 
the estimators are smallest when / = 1. It can be seen from equations (23) and 
(24) that, for a fixed value of 1, the value of m which minimizes V; also mini- 
mizes Vz and V;. The author is not aware of any purely analytical method of 
determining the best value of m; hence, for each value of n, V; was computed for 

= 1 and m = 2(1)n. When the best value of m for a given n had been found, 
the corresponding ca, ¢., Va, Vi, Ex, Ez, and E; were also computed. The 
computations were performed on the IBM 1620 computer. Table 2 gives, for 
n = 2(1)100, the value of m for the best estimators of a, o, and yu, the factors 
Ca and c, (to 6 significant figures or 6 decimal places, whichever is less accurate), 
the coefficient, Vz/o", of o’ in the variance V; (to 7 significant figures or 9 decimal 
places, whichever is less accurate), the coefficients, V;/o and V;/o’, of o° in the 
variances V; and V; (to 7 significant figures or 7 decimal places, whichever is 
less accurate), and the relative efficiencies Ez; , H;, and E; (to 5 significant 
figures). The tabular values of c. , ¢- , Va/o", Vz/o’, Vi/o’, Ex, Ez, and E; are 
accurate to within a unit in the last place given. 


4. Remarks. 

(i) The variance of the best estimator of o for the two-parameter exponential 
population based on two order statistics 2; and z,, from a sample of size n is the 
same as the variance of the best estimator of o for the one-parameter exponential 
population based on one order statistic z, from a sample of size n — 1, with 
k = m — 1. This can be seen by a comparison of equations (4) and (22), though 
the author did not observe this fact until confronted with equal numerical 
values. The relative efficiencies of these estimators are also equal, since in each 
case the variance of the best linear unbiased estimator based on all order sta- 
tistics is o /(n — 1). 

(ii) For the one-parameter exponential population, the values k = 80,1 = 64, 
and m = 93, with coefficients c, = 0.629074, c; = 0.522657, and c,, = 0.181399 
and relative efficiencies E, = 65.093% and Ei, = 82.460%, for n = 100 may 
be compared with the results obtained by Sarhan, Greenberg, and Ogawa [9], 
whose corresponding asymptotic values are 0.7968n, 0.6386n, 0.9266n, 0.6275, 
0.5232, 0.1790, 64.76%, and 82.03%. 

(iii) For the two-parameter exponential population, it can be seen from 
equations (20) and (21) that, since / = 1 for the best estimators, c.g = ac, = 


c,/n. For convenience, however, separate columns for c, and c, are given in 
Table 2. 
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TABLE 1 


Best Estimators of Parameter u of 1-Parameter Exponential Population 


From 1 Order Statistic, x, From 2 Order Statistics, x; and xm 


Ck V;./o Ex(Q%) m ra Cu Vim/o* Eim(%) 


1.00000 1.000000 (100.00 
. 666667 .5555556 | 90.000 2 .500000 | .500000 | .5000000 
545455 -4049587 | 82.313 3 | .447368 | .342105 | .3421053 
- 480000 -3280000 | 76.220 | ¢ -413043 | .265217 | .2652174 
. 437956 . 2807289 | 71.243 | -527997 | .256818 | .2140154 


. 689655 . 2337165 | 71.311 6 | .493892 | .216654 | .1805452 
. 627803 .2017178 | 70.820 -467528 | .188618 | .1571815 
. 582121 - 1787245 | 69.940 .553626 | .187760 | .1393976 
. 546756 - 1613595 | 68.859 j 9 | .527062 | .167990 | .1247200 
. 699806 - 1468046 | 68.118 .505032 | .152501 | .1132201 


.657948 . 1333457 | 68.175 .486391 | .140031 | .1039622 
. 623748 . 1225454 | 68.002 -552632 | .140623 | .0960924 
.595191 - 1136773 | 67.668 3 | .533452 | .130469 | .0891540 
. 570919 . 1062578 | 67.222 -516702 | .121903 | .0833001 
.673448 -0994728 | 67.020 -501916 | .114575 | .0782930 


. 646247 -0932310 | 67.038 5 | .519329 | .217012 | .0735428 
. 622580 -0878686 | 66.945 .506584 | .204426  .069277 

.601766 .0832093 | 66.766 -495052 | .193425 | .0655497 
. 583292 .0791212 | 66.520 .531320 | .193368 | .0621307 
. 660325 .0752377 | 66.456 | 1i -519816 | .183870 | .0590788 


.640194 .0716497 | 66.461 .509272 | .175398 | .0563567 
. 622092 .0684545 | 66.401 | .542190 | .175587 | .0539094 
.605709 .0655900 | 66.288 2 .531650 | .168092 | .0516082 
. 590798 .0630065 | 66.131 3 | .521895 | .161307 | .0495251 
.652475 .0605006 | 66.115 ‘ .512834 | .155136 | .0476304 | 


@eeee eeeee 


. 636502 -0581740 | 66.115 | -542341 | .155498 | .0458907 

-621843 .0560556 | 66.072 | 26 | .533236 | .149905 | .0442401 | 
-608333 .0541184 | 65.993 | ‘ .524718 | .144765 | .0427233 | 
. 595834 0523395 | 65.883 | ‘ 529588 | .203622 | .0412679 | 
.647255 0505915 | 65.887 | | .521922 | .196822 | .0398898 | 


.634017 .0489616 | 65.884 9 | .514693 | .190530 | .0386145 
.621699 .0474550 | 65.852 : -507861 | .184689 | .0374308 
.610203 .0460582 | 65.793 | : ‘ -529266 | .184782 | .0363033 
599445 .0447593 | 65.711 2 | 32 | .522441 | .179406 | .0352472 
. 643532 -0434715 | 65.724 | 23 | & -515963 | .174384 | .0342605 


. 632230 .0422665 | 65.721 3 | é .536163 | .174543 | .0333365 
.621609 -0411405 | 65.694 | .629688 | .169881 | .0324462 | 
-611604 .0400859 | 65.649 -523519 | .165502 | .0316097 
.651175 .0390941 | 65.588 dp |e .517635 | .161380 | .0308224 | 
.640744 -0381083 | 65.603 | 26 | ¢ -536491 | .161608 | .0300780 | 
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TABLE 1 (Continued) 


From 1 Order Statistic, x From 2 Order Statistics, x; and 2m 


Vi/o? Ex(%) C1 Su Vim/o? Ewm([%) 


. 630885 .0371813 | 65.598 | : 39 | .530595 | .157745 | .0293590 | 83.076 
.621548 .0363080 | 65.577 | : 39 | .511693 | .194092 | .0286687 | 83.051 
. 612692 .0354837 | 65.539 |: .527739 | .194098 | .0280016 | 83.052 
.647770 .0346995 | 65.497 | : .522497 | .189685 | .0273649 | 83.053 
. 638578 0339230 | 65.! .517464 | .185500 | .0267612 | 83.039 


. 629834 .0331879 ‘ 3 | .512624 | .181525 | .0261877 | 83.013 
- 621506 .0324908 . 4 | .527790 | .181619 | .0256330 | 83.005 
.613561 .0318289 ’ 5 | .522953 | .177867 | .0251035 | 82.990 
- 645063 .0311934 ‘ y .518293 | .174292 | .0245990 | 82.963 
. 636847 .0305660 | 65. 3: .532844 | .174417 | .0241175 | 82.927 


.628992 | .0299688 ‘ .528185 | .171029 | .0236490 | 82.912 
.621475 | .0293996 34 | .523688 | .167791 | .0232014 | 82.886 
.614272 | .0288564 | 65.386 519344 | .164695 | .0227732 | 82.851 
.642858 | .0283310 , 35 | 51 | .583183 | .164857 | .0223621 | 82. 

.635431 | .0278136 } 51 | .518358 | .192720 | .0219586 | 82.800 


or gor or or or 
Rk ote 


- 628302 .0273189 | 65. | ¢ { .514374 | .189295 | .0215684 | 82. 
.621452 | .0268453 ‘ 36 | .526631 | .189331 | .0211879 | 82. 
.614863 | .0263915 | 65. | .522652 | .186069 | .0208246 
.641028 .0259499 ¥ ‘ 55 | .518794 | .182935 | .0204738 
634252 .0255159 .515050 | .179922 | .0201366 | 82. 


2 8 


Qo 


-627725 | .0250994 . | 39 | .526789 | .180005 | .0198076 | 82. 
.621434 | .0246992 ‘ 58 | .523047 | .177123 | .0194905 | 82. 
.615363 | .0243145 . .519411 | .174347 | .0191850 | 82. 
. 639486 .0239381 | 65.: | .530779 | .174447 | .0188902 | 82. 
. 633255 .0235689 ; 4: j .527144 | .171785 | .0186019 | 82. 


627237 .0232133 a : .523609 | .169216 | .0183238 | 82. 
.621420 | .0228707 ; .520168 | .166736 | .0180552 | 82. 
.615792 | .0225404 . ‘ 33. | .522477 | .191905 | .0177944 | 82.64: 
.638167 | .0222158 | 65. ‘ ) .519185 | .189124 | .0175365 

. 632401 .0218979 . é 55 | .515976 | .186434 | .0172871 | 82. 


.626818 .0215909 a f .525891 | .186478 | .0170434 | 82.6: 
.621409 .0212943 . 22: j . 522686 | .183891 | .0168070 
.616163 .0210076 | 65.: j .519559 | .181386 | .0165781 | 82. 
.637027 .0207248 ; .516508 | .178961 | .0163564 | 82. 
.631661 .0204481 .526081 | .179033 | .0161388 | 82. 


=~] 
— 


. 626455 .0201804 
.621399 .0199211 
-616488 .0196699 
.636031 .0194214 
.631014 .0191784 


_ 


.523031 | .176693 | .0159279 | 82.609 
.520052 | .174423 | .0157233 | 82.597 
.529377 | .174507 | .0155244 | 82.583 
.526398 | .172314 | .0153294 | 82.575 
.523487 | .170185 | .0151399 | 82.563 


“I -J =] 


~I 
or ke W tO 








EXPONENTIAL ORDER STATISTICS ESTIMATION 


TABLE 1 (concluded) 


From 1 Order Statistic, x, From 2 Order Statistics, x; and x, 


Ex (%) l | | Ci Cm Vim/o* Eim (%) 


.0189428 | 65.173 | 53 | | .520640 | .168116 | .0149559 | 82.547 
- 621392 .0187142 | 65.165 | 52 3} | .522467 | .189026 | .0147749 | 82.540 
-616774 -0184924 | 65.152 | 53 | .519723 | .186749 | .0145969 | 82.539 
-635154 .0182722 | 65.152 | 54 | .517038 | .184536 | .0144240 | 82.535 
-630443 | .0180572 | 65.153 | 54 . 525362 | .184580 | .0142541 | 82.536 


. 625856 .0178483 | 65.149 | 55 | .522678 | .182437 | .0140886 | 82.534 
-621385 .0176452 | 65.141 | 56 | .520050 | .180352 | .0139276 | 82.529 
-617028 | .0174478 | 65.129 | 56 | .528185 | .180407 | .0137707 | 82.520 
-634376 | .0172514 | 65.131 | 57 | .525558 | .178385 | .0136163 | 82.518 
.629936 .0170598 | 65.130 | 58 . 522984 | .176415 | .0134660 | 82.512 


625605 .0168733 | 65.127 | 59 | .520461 | .174496 | .0133195 | 82.503 
.621380 .0166917 | 65.119 | 59 .528366 | .174567 | .0131763 | 82.493 
.617256 .0165150 | 65.109 | 60 | .525843 | .172703 | .0130356 | 82.487 
-633681 .0163387 | 65.111 | 61 | | .§23368 | .170884 | .0128984 | 82.478 
. 629482 0161668 | 65.111 | 62 | 89 | .520941 | .169111 | .0127645 | 82.466 


.625382 | .0159993 | 65.107 | 61 | 522452 | .186991 | .0126315 | 82.466 
-621376 | .0158360 | 65.100 | 62 | 90 | .520100 | .185065 | .0125015 | 82.465 
.637131 | .0156767 | 65.091 | 63 | .517792 | .183185 | .0123745 | 82.461 
.633057 | .0155177 | 65.093 | 92 | .524964 | .183228 | .0122493 | 82.462 
-629074 | .0153627 | 65.093 | 64 | 93 | .522657 | .181399 | .0121271 | 82.460 
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TABLE 2 
Best Estimators from 2 Order Statistics x, X» of Parameters a, « and Mean yp of 
2-Parameter Exponential Population 


Ca Ce Va/o Ex(%) | Vs/o® | ExX%) | Vi/o* 


St : : hae ae tee be | 


| Ex(%) 


| .500000 1.00000 -5000000 (100.00 (1.000000 100.00 -5000000 |100.00 
| .222222 | .666667 | .1728395 96.429 | .5555556 | 90.000 | .3580247 | 93.103 
| .136364 | .545455 | .08780992 | 94.902 | .4049587 | 82.313 | .2902893 | 86.121 

.096000 | .480000 | .05312000 | 94.127 | .3280000 | 76.220 | .2499200 | 80.026 


.072993 | .437956 | .03557580 | 93.697 | .2807289 | 71.243 | .22272864 | 
.098522 | .689655 | .02517789 | 94.565 | .2337165 | 71.311 | .1921182 
| .078475 | .627803 | .01877684 | 95.102 | .2017178 | 70.820 | .1700652 | 
| .064680 | .582121 | .01455215 | 95.442 | .1787246 | 69.940 | .1535601 
.054676 | .546756 | .01161360 | 95.673 | .1613595 | 68.859 | .1407012 | 


~ 
rN 


830 


.501 


J J J 
Ke OO 


| .063619 | .699806 | .009477724 | 95.919 | .1468046 | 68. - 1295906 | 
-054829 | .657948 | .007870456 | 96.256 | .1333457 | 68. - 1189919 | 
-047981 | .623748 | .006642280 | 96.507 | .1225454 | 68. . 1103346 | 
| .042514 | .595191 | .005682027 | 96.700 | .1136773 | 67. - 1031197 | 
-038061 | .570919 | .004916701 | 96.852 | .1062578 | 67. .0970068 


P =J oJ 
SS 


-042091 | .673448 | .004294816 | 97.016 | .0994728 | 67. -0913336 | 
| .038015 | .646247 | .003782806 | 97.189 | .0932310 . 0: - 0860455 

.034588 | .622580 | .003357619 | 97.330 | .0878686 d. .0814630 
| .031672 | .601766 | .003000580 | 97.447 | .0832093 | 66. -0774510 | 

-029165 | .583292 | .002697803 | 97.545 | .0791212 | 66. -0739069 | 


| .031444 | .660325 | .002438181 | 97.653 | .0752377 | 66.456 | .0705104 | 67 
.029100 | .640195 | .002214152 | 97.758 | .0716497 ; .0673502 | 

| 027047 | .622092 | .002019763 | 97.847 | .0684545 | 66. .0645217 

| .025238 | .605709 | .001849983 | 97.925 | .0655900 | 66. -0619741 

| .023632 | .590798 | .001700810 | 97.993 | .0630065 : . 0596668 


| 025095 | .652475 | .001568788 | 98.067 | .0605006 0574155 | 

| 023574 | .636502 | .001451542 | 98.137 | .0581740 , .0553164 
.022209 | .621843 | .001347010 | 98.199 | .0560557 .0533987 

| .020977 | .608333 | .001253411 | 98.254 | .0541184 0516395 

| 019861 | .595834 | .001169266 | 98.303 | .0523395 .883 | .0500195 


| .020879 | .647255 | .001093227 | 98.357 | .0505915 ’ - 0484208 
| .019813 | .634017 | .001024377 | 98.408 | .0489616 ‘ . 0469259 
-018839 | .621699 | .000961850 | 98.453 | .0474551 . -0455408 
.017947 | .610203 | .000904895 | 98.494 | .0460582 ‘ . 0442538 
.017127 | .599445 | .000852865 | 98.531 | .0447593 . - 0430545 


.017876 | .643532 | .000805148 | 98.572 | .0434715 | 65. .0418616 
-017087 | .632230 | .000761334 | 98.610 | .0422665 . .0407431 
| .016358 | .621609 | .000721011 | 98.644 | .0411405 ‘ 0396962 
-015682 | .611604 | .000683817 | 98.676 | .0400859 ¥ . 0387140 
.016279 | .651175 | .000649434 | 98.705 | .0390941 .0377889 
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TABLE 2 (continued) 
Ex(%) Vii/o? Ex(%) 


.015628 | .640744 | .000617554 | 98.737 | .0381083 | 65.602 | .0368669 | 66.158 
.015021 | .630885 | .000587971 | 98.767 | .0371813 | 65.598 | .0359988 | 66.140 
.014455 | .621548 | .000560469 | 98.794 | .0363080 | 65.577 | .0351797 | 66.106 
.013925 | .612692 | .000534857 | 98.819 | .0354837 | 65.539 | .0344057 | 66.057 
.014395 | .647770 | .000510963 | 98.843 | .0346995 | 65.497 | .0336683 | 66.003 


.013882 | .638578 | .000488621 | 98. | .0839230 | 65.508 | .0329368 | 66.003 
013401 | .629835 | .000467717 | 98. .0331879 | 65.503 | .0322434 

012948 | .621506 | .000448130 | 98. .0324909 | 65.485 | .0315852 

.012522 | .613562 | .000429750 | 98. | .0318289 | 65.454 | .0309596 | 
-012901 | .645063 | .000412477 | 98. 0311934 | 65.425 | .0303581 


.012487 | .636847 | .000396219 | | .0305661 | 65.432 | .0297636 | 
.012096 | .628992 | .000380906 | 98. .0299689 | 65.427 | .0291971 
.011726 | .621475 | .000366465 | 99.012 | .0293996 | 65.412 | .0286567 
.011375 | .614272 | .000352831 | 99.029 | .0288564 | 65.385 | .0281405 
.011688 | 642858 | .000339044 | 99.046 | .0283310 | 65.365 | .0276407 





.011347 | .635431 | .000327747 | 99. .0278136 | 65.370 | .0271480 | 
.011023 | .628302 | .000316195 | 99. 0273189 | 65.366 | .0266765 | 
010715 | .621452 | .000305245 .094 | .0268453 | 65.352 | .0262249 | 
010421 | .614864 | .000294855 | 99. | .0263915 | 65.329 | .0257918 


.010684 | .641029 | .000284986 | 99.123 | .0259499 | 65.315 | .0253699 


.010398 | .634253 | .000275602 | 99.137 | .0255159 | 65.319 | 0249549 | 
.010125 | .627726 | .000266675 | 99.151 | .0250994 | 65.314 | .0245564 | 
.009864 | .621434 | 000258176 | 99.164 | .0246992 | 65.302 | .0241733 
.009615 | .615364 | .000250077 | 99.176 | .0243145 | 65.282 | .0238047 
.009838 | .639486 | .000242352 | 99. .0239381 | 65.272 | .0234439 








.009595 | .633256 | .000234979 | 99.200 | .0235689 | 65.275 | .0230896 
.009362 | .627237 | .000227938 | 99. .0232133 | 65.271 | .0227483 
.009139 | .621420 | .000221209 | 99.223 | .0228708 | 65.260 | .0224193 
.008925 | .615792 | .000214774 | 99. .0225404 | 65.242 | .0221018 
-009117 | .638167 | .000208615 | 99.244 | .0222159 | 65.236 | .0217897 


.008907 | .632402 | .000202717 | 99. .0218979 | 65.238 | .0214838 
.008706 | .626818 | .000197066 | 99. .0215909 | 65.234 | .0211882 
.008512 | .621409 | .000191648 | 99. | .0212943 | 65.224 | .0209025 | 
.008327 | .616164 | .000186451 | 99.: .0210076 35.208 | .0206263 
.008494 | .637027 | .000181462 | 99. | .0207248 .204 | .0203536 


.008311 | .631662 | .000176670 | 99.303 | .0204481 .206 | .0200867 
| 62 | .008136 | .626455 | .000172066 | 99. .0201804 .201 | .0198283 
.007967 | .621399 | .000167640 ‘ -0199211 | 65.192 | .0195779 
| 64 | .007804 | .616488 | .000163382 | 99.% .0196699 | 65.178 | .0193353 
| 64 | .007950 | .636031 | .000159285 . .0194214 | 65.177 | .0190951 
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Ca 


| .007790 
- 007636 
-007487 
007343 | 
.007472 


007331 | 
.007194 | 
| .007061 | 
| .006933 | 
| 007049 | 


| .006922 | 
.006800 | 
.006682 | 
.006567 | 
.006670 | 


006557 
| .006447 | 
-006341 | 
-006436 
.006331 | 
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Co 


.631015 
-626137 
.621392 
.616774 
-635155 


-630444 
- 625856 
.621386 
-617029 
-634377 


.629936 

625606 
.621380 | 
.617256 
.633682 | 


-629483 
-625382 
-621376 
-637131 
-633057 | 


Va/o? 


-000155339 
.000151538 
-000147875 
-000144344 
-000140937 | 


.000137650 
000134476 
.000131411 
.000128449 | 96 
000125587 


.000122818 
-000120141 
-000117550 
-000115042 | 
-000112614 


-000110261 

-000107982 | 
-000105772 
-000103630 
-000101552 


TABLE 2 (concluded) 


99.353 
99.360 


| 99.367 
99.375 | 


9.382 


389 | 


396 


402 | 
9.409 | 


Vs/o? 


.0191784 
.0189428 
-0187143 
.0184924 
.0182722 


.0180572 | 
0178483 
0176453 
0174478 | 
0172515 | 


.0170598 
| .0168733 | 
.0166918 | 
.0165150 
0163387 | 


0161668 | 
0159993 
0158360 | 6 
0156767 
0155177 


.0188602 | 


. 0186324 


-0177749 | 
| .0175725 | 
| .0173756 | 6 
| .0171842 
-0169937 | 


0159403 | 
| 0157774 | 
0156186 | 


| 0184112 
.0181964 | 
0179832 | 


.0168077 
.0166266 | 
| 0164503 | 
| 0162786 
.0161074 


-0154636 


-0153089 


65.381 
65.374 
65.364 


| 65.351 
| 65.351 


65.348 
65.342 
65.333 


| 65.321 
65.321 





ON THE TWO SAMPLE PROBLEM: A HEURISTIC METHOD 
FOR CONSTRUCTING TESTS' 


By V. P. GopsaMBE 
Science College, Nagpur, India 


1. Introduction. The two-sample problem arises as follows. We are given two 
independent samples from populations A and B respectively and are required 
to investigate whether the population A could be considered as identical with 
B. In the usual terminology of hypothesis testing: Given two independent sam- 
ples 2, --- , 2m and 2m41,°** , myn from populations with unknown cumula- 
tive distribution functions F and G respectively, the problem is to test the com- 
posite hypothesis 


Hy:F =G 
against the alternatives 
H,:F # G, 


F and G being completely or partially unspecified. 

In the following lines we shall discuss a method (subsequently called the V- 
method), for testing H» against H, , when F and G are partially specified (the 
exact meaning of this will be clear later). A test for the situation w here F and 
G are completely unspecified is also put forward. 


2. Notation. Suppose F(x) and G(x) to be two cumulative distribution 
functions on the real axis, —«° < x < , such that their frequency functions 
exist everywhere. Let 2, ---, 2m and 2pii,°** , Lmin Genote independent 
samples from F and G respectively. Now the combined sample from F and G 
can be represented as a point 


(2.1) K = (21, °°* , Lm, Lg, *** » Smsn) 


in the m + n dimensional Euclidean space & of all such points. It follows from 
the existence of the frequency functions that the probability measure of the set 
of points x in X defined by x; = 2; for i # j is zero. Next we define on X a 
vector-valued function y, 


(2.2) (x) _ (v(x), ie vi(x), phi dey > ¥m4n(X)), 


where y;(x) is the total number of the components of x less than or equal to 
x;. Thus y;(x) is the rank of x; in the combined sample x = (2, +--+ , Zm4n)- 
Further we arrange the last n components of x, that is m4: , -** , 2m4n , accord- 
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ing to their magnitudes as —*“ < yy <---< Yn < © to define another vector- 
valued function a, 


(2.3) a(x) = (a,(xX), -°* , @i(X), -°* , Qngi(X)), 


where a;(x) is the total number of the first m components of x lying between 
yi. and y;, yo denoting — © and y,4, denoting + for convenience. In addi- 
tion we define 


(2.4) B(x) = (6;(xz), --- , 62), --- , Bass(®)), 


where b;(x) denotes the number of individuals out of %m4i,-++ , min lying 
between the 7 — Ist and 7th ordered individuals from 2, --- , 2m; 6:(x) and 
bm4i(x) being defined analogously to a(x) and a,4:(x) in (2.3). Now it is im- 
portant to note that, given a(x) in (2.3), b(x) in (2.4) is uniquely determined 
and conversely. 

For simplicity we write y for y(x), a for a(x), etc. Now P(y | F, G) denotes 
the probability of obtaining x such that y(x) = y given F and G. Similarly, 
we have P(a| F. G), ete. 


3. The most powerful rank test. Following the above notation it is easy to 
see that 


P(y¥|F,F) = 1/(m+n)!. 


Hence the most powerful rank test of the hypothesis Ho:F = G, against the 
simple alternative H,, that the c.d.f.’s are specifically F and G respectively, 
has the critical region 


(3.2) y:P(y\|F,G@) > const. 


Since hereafter there is no possibility of confusion, we shall write P(y) for 
P(y|F, G), P(a) for P(a| F, G), ete. Now from the definition of 
@ = (a1, °-°- , Gn41) 


in Section 2 it follows that 


(3.3) P(a) = m!n! P(y), 


which of course is also true under the null hypothesis. Thus the most powerful 
test (3.2) is associated with the critical region 


(3.4) a:P(a) > const. 
Suppose further that we have a function @ such that 


(3.5) G(x) = 0(F(2z))’ 


? Here one should avoid the mistake of assuming that F is a uniform distribution on 
(0, 1). 
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for every x and 
(3.6) 0(F) = (0/0F)0(F) 
exists for every F,0 < F < 1. 

THEOREM 3. We have 


(3.7) P(a) = m! n! com n+l me . 
Il / D JU Pi TI] 6m + oe + po II dp: , 
a; ! e7e _ in 
i=] 
where Pav = 1 — Di — *** — Dn and the domain of integration D is 


D = {pi,--:, pn0 S ps S 1, Do pi S 1, p= 1,---, nh. 


The theorem can be easily derived from a result of Hoeffding’s [1], top of p. 88. 
It is important to note that, in the above formulae, P(a) depends on F and G 
only through @, or rather 6’, and so does the corresponding test (3.4) for testing 
against the alternative G = @(F). It is, however, seldom possible to evaluate 
the integral on the right side of (3.7). For this and other reasons we shall in 
the next section put forward another rank test which depends on @ alone. 


4. The V-test. Consider the following degenerate case of testing a simple 
hypothesis Ho against a simple alternative H,. (Note that these are not the 
same as the hypotheses in Section 1.) 


Hy: Both samples 2, --- , 2» and y,, --- , y, are drawn from a common 
specified c.d.f. F. 

H,: The sample x, , --- , 2 is drawn from F, while the sample y; , - - - 
y» is drawn from another specified ¢.d.f. G. 


The most powerful test (not the most powerful rank test) in this case is inde- 
pendent of the sample from F and in fact is given by the critical region 


g(yi) --> g(Yn) 
(4.1) l a eae » Un 
- , F(ys) +++ Fn) 


where g and f are the frequency functions of G and F respectively. Again, if, 
as in (3.5), we have G(x) = 0(F(2x)) for all 2 and if @’(F) exists for0 < F < 1, 
then the critical region (4.1) can be expressed as 


> const., 


(4.2) F(y:), --- , F(yn):]]@’(F(y:)) > const. 
t=] 


For instance, when F and G are normal distributions with unit variance, the 
mean of F being 0 and that of G being 6, it can be seen from equation (6.4) of 
Section 6 that (4.2) can be written as 


F(a), °°, F(yn): DD WF (y:)) > const., 
1=1 
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where y’ is the functional inverse of the normal integral as defined in (6.1). 
Since, however, F itself is the standard normal c.d.f., the above critical region is 
identical with 


n 


hee Yn: > y a const. 
1 
This is the usual optimum test for the normal mean when the variance is known. 
Now the critical region (4.2) clearly depends on F in addition to 6’. However, 
an approximation to (4.2) which depends on 6’ alone can be worked out as fol- 
lows. For a given second sample (y,--- , yn) the quantities 


a + s+ bay| 


m | 


' 


(4.3) F(y:) - 


can simultaneously be made arbitrarily small for 7 = 1, --- , n, with as large a 
probability as we please, by increasing sufficiently the size of the first sample. 
Hence 


(4.4) a,-*+,a:]]@ (* cee *) > const. 
i=] m 
is the suggested approximation to (4.2). Note that (4.4) is a non-parametric 
test, depending only on the order relationships within the sample. 
Now for testing the null hypothesis Hy:G = F against the alternative H,:G = 
6(F), we propose the V-statistic 


(4.5) via) = J] ¢’ (’ +a + ~ ts) 


m+2. 
or a suitable monotonic increasing function of the right side of (4.5), the cor- 
responding V-test being defined by the critical region 


1 


(4.6) a:V(a) > const. 


This will also be referred to as the V-method of obtaining tests. The motivation 
for the V-method is made clear in the preceding paragraph. In fact, (4.6) is 
obtained from (4.4), with a small modification to prevent the V-statistic from 
assuming infinitely large values. 

About the intuitive appeal of the V-method, it may be said that some tests 
derived by its application, with a slight difference, have already been proposed 
by different authors on more or less intuitive grounds. This will be verified in 
some of the subsequent sections, where V-tests are compared with some of the 
known tests, including the one given by the statistic (3.7). 

Further it would appear from the above discussion that, though the V-method 
is put forward as a sure method of obtaining tests for simple alternatives, it 
can in some cases yield tests even for composite alternatives. This can be checked 
from the illustrations to follow. 
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5. Lehmann’s alternatives. In this case it is assumed that 
(5.1) G = 0(F) =F, 
where k > 1. From (5.1) we have 
(5.2) 0(F) = kF*". 


Hence from (4.6) the V-test for the present situation is given by the critical 
region 


(5.3) a:V(a) > const., 
where the V-statistic is defined by 


(5.4) V(a) = Ia bee 5c ek 


It is interesting to see that the V-test does not depend upon k for k > 1. The 
test for k < 1 can be obtained similarly. 

I. R. Savage [3] has studied very extensively the alternatives in (5.1). He 
also has tabulated the probabilities P(a) in (3.7), when 0(F) = F", for differ- 
ent values of k and different sample sizes. 

In Table 1 we give the 5 a’s corresponding to the largest values of V(a) in 
(5.4). Now it so happens that the same a’s are the ones having the largest prob- 
abilities P(a) in (3.7), for all values of k > 1 considered by Savage [3]. For 
these values of k, the ordering of the P(a)’s mentioned above is also the same. 
Hence in Table 1 the P(a)’s are reproduced from Savage’s table for just one 
set of the values of k. From them we can construct the most powerful tests, 
defined by the critical regions a:P(a) > const., as in (3.4), up to the signifi- 
cance level of 25 per cent. Each of these tests, as can be seen from Table 1, will 
be equal in power to the corresponding V-test. Further from Savage’s table in 
[3] it appears that the performance of the V-test for sample sizes other than 
those considered in Table 1 is equally good. 

The statistic that Savage [3] has proposed for the present problem is, in our 
notation, 


n+l a ° 
mF (a) = > > _ at +> + Gates 
a i) ™ £4 at 2 beat inead' 


the corresponding critical region being 
(5.6) a:T(a) < const. 


The 5 smallest values of T(a) are reproduced in Table 1. It is difficult to see 
any connection between (5.4) and (5.6). Now Savage [3] has proved that for 
the cases m = 2,n = 3 and m = 2, n = 4, dealt with in Table 1, the simple 
ordering of the probabilities P(a) in (3.7) when 6(F) = F* does not depend on 
k for k > 1, and is given Ly the statistic T(a) in (5.5). 
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TABLE 1 
P(a) 
for G = F* 
k = 3.7769 


(2, 0, 0, 0) ’ 1.4333 
(1, 1, 0, 0) 1.9333 
(1, 0, 1, 0) 2.2667 
(1, 0, 0, 1) 2.5167 
(0, 2, 0, 0) 2.9333 











P(a) 
for G = F* 
k = 3.6173 
(2, 0, 0, 0, 0) .3743 
(1, 1, 0, 0, 0) .1621 
(1, 0, 1, 0, 0) .1106 
(1, 0, 0, 1, 0) .0862 
(1, 0, 0, 0, 1) 








for G = F* 
k = 3.0546 





(3, 0, 0, 0) 
2, 1, 0, 0) 
(2, 0, 1, 0) 
(2, 0, 0, 1) 
(1, 2, 0, 0) 


. 2549 
.1513 
.1130 
-0922 


-0746 


Next we consider the alternative 
(5.7) G = 0(F) = F’ + (1 — A/F, 
As proved by Lehmann [2!, the well-known Wilcoxon test is optimum against 
the alternative hypothesis (5.7) for very small values of X. It is interesting to 
see that the V-statistic (4.5) for the present situation is again Wilcoxon’s 
statistic. For from (5.7) we have 
(5.8) O(F) = 2\F + (1 — A). 
Now for very small values of A, (5.8) can be written as 


(5.9) 0 (F) =e". 
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Hence the V-statistic is 
“itat-::: +4; 
5.10 V = ———_——————— 
( (a) 2X m+ 2 : 
the corresponding V-test being defined by the critical region 
(5.11) a:V(a) > const. 
This is the usual one-sided Wilcoxon test. 


6. Normal alternatives. Let F and G be the following distributions: 


z 


(6.1) Pia a (2x) [ exp (—4h*) dh = ¥(z) 


and 


G(x) = (24) [ exp (—4(h + 8)*) dh 
(6.2) 7 


z+6 
= (24) | exp (—43h’) dh = (x +3). 


Now though for convenience of notation it has been assumed above that both F 
and G have variances equal to 1, the following arguments are valid for any 
unknown common variance o°. Write 


(6.3) G(x) = 0(F(z)). 
Then it is seen from (6.1) and (6.2) that 


ee a aF\ _ fro 
(6.4) ip) = (3) /(Z) = ex ( fiesta 


= exp (—d — 26y"(F)). 


Next since @’(F) in (6.4) depends on 6 it follows from (3.7) that the most 
powerful rank order test of Hy against H, may in general depend on 4, though it 
has been proved to be independent of 6 for all sufficiently small values of 6. 
In fact, it is then Hoeffding’s C; criterion [1]. Furthermore, some empirical 
sampling investigations by Teichroew [4] suggest that the most powerful rank 
order test may exist uniquely for all 6 > 0. The situation for 6 < 0 is similar. 
However, no theoretical result is available in that direction. 

It is interesting to see that in the present case the V-test obtained from (6.4) 
above does not depend upon 6. It follows from (6.4) and (4.6) that the V-test 
is defined by the critical region 
(6.5) a:V(a) > const., 


where the V-statistic is defined by 


2 , a -1 me os me 
(6.6) Via) = ~v ( ae ), 


t=] 
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In the following illustrations the relative frequencies are reproduced from 
Teichroew’s experiments [4] for some specified 5. However, the ordering of the 
a’s by their relative frequencies is more or less the same for the other values of 
5 considered by Teichroew. It can be seen from Table 2 that the ordering of 
a by V(a) in (6.6) is nearly the same as that by the relative frequencies. The 
performance of V(a) is as good for the other sample sizes of Teichroew [4] as 
for those considered in Table 2. [t must, however, be said that for all these illus- 


TABLE 2 


m=3 n 


V(a) as in (6.6) o = 0.75 Relative X(a) asin (6.7) Hoeffding’s 
Frequency C, 


(3, 0, 0) —1.68 
(2, 1, 0) —1.09 
(1, 2, 0) 
(2, 0, 1) 
(1,1, 1) 
(0, 3, 0) 
(1, 0, 2) 
(0, 2, 1) 


—1.66 
—1.16 
—0.66 


w ds 
ss) 
or or 


_> > 
Sa 


“JIonrona 
— 
aRSaS 


Nee 
| oo 
| © 


| 


trative cases Hoeffding’s C,-test [1] or van der Waerden’s X-test [5] has com- 
parably good performance. 
Now van der Waerden’s X-statistic, in our notation is defined as follows. 


- aye yt (ati tat 
(6.7) X(a) = bv( Tot ), 


When is large enough compared to n, X(a) is nearly equal to V(a) in (6.6). 
For the case considered in Table 2, we see that even for m = 3 and n = 2, 
the critical regions given by the two statistics are identical. It may also be of 
interest to note that the X-test has been shown to be asymptotically equivalent 
to the C;-test. 

Next we consider two normal populations with the same mean but different 
variances, 


z 


F(z) (2noi) | exp (—4h?/oji) dh 


(6.8) 


= (2n)4 [ “exp (—§") dh = o(2/01) 


3 These are Monte Carlo results; see [4]. 
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= (203)~ | exp (—+h’/o2) dh 


= (an) [ exp (—43h’) dh = p(2/o2). 


Consider the problem of testing Ho:o; = o2 against o; < 3. 
If we put G(x) = 6(F(x)) from (6.8) and (6.9) we have 


, 7 0G oF 9 2 , 
(F) = (2) /(¥) = exp 3(2°/o} — 2'/o3) 


= exp {3(W"(F))? — 301/03) (¥"'(F))”} 
exp (1 — k*)(y"(F))? 


where oj/o; = k’. Hence, to test oi < 02, the critical region of the V-test would ‘ 
be, from (6.10) and (4.6), 


(6.10) 


(6.11) a:V(a) > const. 


Here the V-statistic is given by 


512 (a) = we (y(itats + “\y 

(6.12) V (a) X (v ( m+ , 

Unfortunately we do not have the necessary Monte Carlo frequencies to judge 
empirically for small samples the performance of V(a) in (6.12). Later on, we 
shall have an interesting comparison of V(a) in (6.12) with some other statis- 
tics. Furthermore a different application of the V-method, suggested by the 
theorem in the next section, gives another test for the same problem of testing 
the variance ratio of two normal populations with the same mean. This test is 
discussed at the end of the next section. 


7. A theorem about the V-statistic. Consider a case where G = 0(F) and 
“2 7 
» 0O(F) 
for all F, which implies that 
d 0G/dx 


(7.2 — 
Soa dx dF /dx 


0. 
Now (7.2) is precisely the monotone likelihood ratio condition of Savage [3]. 
Thus it follows from Theorem 6.1 of Savage [3] that if 


+6 Men *** 5 Oho?) 


(7.3) 
-,a,;+1,---,a;-—1,---), 
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the other components of a and a’ being identical and 7 < j, then 
(7.4) P(a’) = P(a). 

Further, from (4.5) and (7.1) we have 

(7.5) V(a’) = V(a). 


Now the V-statistic has been defined for a simple alternative G = @(F), and 
as such we can order simply the vectors a, according to the values V(a@). Simi- 
larly the vectors a can be ordered simply according to the values of P(a) in 
(3.7). Let S be a set of vectors which could be arranged in such a way that for 
any pair of successive vectors, a, and a, 4; say, a; is related to a4; as a to a’ 
in (7.3) for some 7, j(¢ < 7), where 7, 7 can vary with k. Then from (7.3), (7.4) 
and (7.5) we have 

Treorem 7.‘ For any function 0 satisfying (7.1), a simple ordering of S, accord- 
ing to P(a) is identical with a simple ordering given by V(a). 

There is a similar theorem if instead of (7.1) we have 


oor <0. 
oF? 


Now as already noted by Savage [3], for both Lehmann’s alternative in (5.1) 
and the normal alternative in (6.1)—(6.2), the monotone likelihood ratio con- 
dition (7.1) is fulfilled. This can also be checked from (5.2) and (6.4). Hence 
Theorem 7 above is applicable in both cases. 

On the other hand, for the alternative in (6.8)—(6.9), the condition (7.1) is 
not fulfilled. This can be seen from (6.10). We therefore substitute 


(7.6) 


(This transformation is due to the referee.) Now the c.d.f.’s of z, viz. F(z) 
and G(z), corresponding to (6.8) and (6.9), are 


z/0% . 
Fle) = («) | e* ho dh 
0 


and 


2 
z/@eq 
(7.8) G(z) = (x)? | eh dh. 


0 
Now putting G = 6(F), we have 
&’(F) = const. exp [(1 — o}/02)z/o%] 
(7.9) 
= const. exp [(1 — o3/0:)+/2I"'(F, —}3)] 


‘ The author is indebted to the referee for an important clarification in this theorem. 
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where J" is the inverse of J in (7.14). It is clear from (7.7), (7.8) and (7.9) 
that the monotone likelihood ratio condition of (7.1) is now satisfied. Next 
referring to the notation of Section 2, it can be seen that a point 


x= (aa, °*° » Mesa) 


in X is transformed by the substitution (7.6), i.e., 3a; = 2:,4 = 1,--+,m+n, 
into a point 


(7.10) os = (a ii. ¢ Sata) 


in Z, say. Further, exactly analogous to a = (a;(x), +--+ , @n4i(x)) in (2.3), 
we can define on Z a vector-valued function 


(7.11) c(z) = (e:(Z), +++ , Cn4i(Z)). 


Now from (4.5), (7.9), and (7.11) the V-test for testing the null hypothesis 
against the alternative in (7.7)—(7.8) is defined by the critical region 


(7.12) c:V(c) > const. 


where the V-statistic is given by 
(7.13) 


I as before being the inverse of 


(7.14) I(u, — 4) = (x)* ge eh? dh. 


(The values of (7.14) for different u’s are tabulated in Tables of the Incomplete 
I'-Function, Cambridge University Press, 1946.) 

Two suggestions for comparing the statistics (7.13) and (6.12) are as follows: 
(i) It will be remembered from (6.8) and (6.9) that for convenience of notation 
we assumed the common mean of the two populations to be zero. Actually, 
if » is the common mean, the transformation (7.6) would be 3(z — yu)’ = 2, 
which means the vector c in (7.11) depends on yu. That is, contrary to the test 
(6.11), the test (7.12) cannot be worked out unless » is known. (ii) Theorem 7 
above is valid for (7.13) while it is not valid for (6.12). 

Now Theorem 7 implies some justification for using any of the V-statistics, 
such as (5.4) or (6.6), where @ satisfies (7.1), for testing against a wider alter- 
native hypothesis G = 6(F), which does not specify anything about 6, excepting 
that it satisfies the monotone likelihood ratio condition (7.1). 

In the next section we develop a test of the null hypothesis Hy:F = G against 
the general alternative H,:F ¥ G. 


8. ¢-test. Substituting in (3.7) 


m! n+1 ss 
(8.1) ¢(a/p) = eral (pi) , 
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we have 


(32) P(a) = | --- [ olap)nt TL o(m +--+ + po) Tap. 

1 
Now in (8.1), a;/m is the maximum likelihood estimate of p;, i = 1, --- 
Therefore if 


(8.3) ¢(a) = Tat I] (a;/m)**, 


it follows that 
(8.4) o(a) = o(a/p). 
Thus from (8.1), (8.2) and (8.4) we have 


(8.5) P(a) < (a) | ‘* fm II O'(pi + ++ + pi) I] dp; . 


Now using the transformation p; + --- + pi = gi, 7 = 1,°-+, nm we have 


(36) +: fom+--- +e Man=f--- [Iq aw, 
where 


D’ = {m,°°',@:0Sq: 8 Ll@isa,t= 1,---, nh, 


go denoting 0. Integrating the right hand side of (8.6) term by tevm and noting 
that 6(0) = 0 and 6@(1) = 1, we have 


(8.7) / ss /U 8’ (qi) I dq; = 1/n!. 
Hence from (8.5), (8.6) and (8.7) it follows that 
(8.8) P(a) S ¢(a). 

Next we recollect the definition of 


(8.9) b = (b1,--+ , Om41) 

in (2.4) of Section 2. It has also been noted that a defines b uniquely and con- 
versely. Therefore we have 

(8.10) P(a) = P(b). 

Further it follows from (8.3) and (8.8) that if 


m+1 


* TT (bs/n)", 


(8.11) b) = er 
Ii b; ! 1 


then 
(8.12) P(b) S ¢(b). 
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We can write (8.8), (8.10) and (8.12) together as 
(8.13) P(a) = P(b) S min (¢(a), o(b)). 

Now define the ¢-statistic as the minimum of ¢(a) and ¢(b) i.e., 
(8.14) ¢@ = min (¢(a), ¢(b)) = ¢(a, b). 


Then the ¢-test for testing Ho: F = G against H,:F ¥ G is defined by the cri- 
tical region 


(8.15) ¢ > const. 


The motivation for this test lies in the inequality (8.13) and the fact that in a 
degenerate case when there exist two numbers u and v,v > u, such that F(u) = 
1 and G(v) = 0, then 


(8.16) P(a) = P(b) = ¢@=1 


with probability equal to unity. 
Wolfowitz [6] proposed a test statistic equivalent to 


(8.17) W = $(a)-o(b) 


for testing Ho:F = G against F ¥ G. In the numerical illustrations in section 
10, the ¢-statistic defined in (8.14) appears to be better than Wolfowitz’ statistic 
in (8.17), though possibly quite a few statements made hereafter in case of the 
¢-statistic may also hold for W in (8.17). A simple method for computing 
¢-statistic could be obtained from one suggested by Wolfowitz [6] for his statistic 
W above. 


9. Some properties of the ¢-statistic. Let 
(9.1) a= (--- ,a,°°* ,@j, °°") 
(9.2) a’ = (---,a;+1,---,a;—1,---) 


be two vectors with their ith and jth components (¢ < j) as shown (a; ,a; 2 1), 
other components of a being equal to the corresponding ones of a’. Now from 
(8.3) we have 


(9.3) ofa’) _ (1 + 1/a,)* ot 
¢(a) (1 + 1/(a; — 1))* 





Thus from the monotonicity of function (1 + 1/z)*, if in (9.1) and (9.2) 
(9.4) a; 2 4;, 

we have 

(9.5) o(a’) > o(a). 


Further, it follows from the above argument and the symmetry of ¢(a) in 
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aq, en » An+i that if 
(9.6) a” = (---,a:,;+4;,°--,0,---) 


with its ith and jth components as shown, other components being equal to the 
corresponding ones of a in (9.1), then 


(9.7) o(a”) > (a) 


regardless of the condition (9.4) above. Again as in (8.9) define b and b’ from 


a and a’ in (9.1) and (9.2). Then it follows from arguments similar to the above 
that 


(9.8) ¢(b’) = o(b). 


Thus from (9.5) and (9.8) we have for a, b and a’, b’ defined by (9.1) and (9.2) 
above, provided (9.4) holds, 


(9.9) ¢(a’, b’) = o(a, b), 


@ being given by (8.14). Similarly from (9.6) and (9.7) we have, regardless 
of (9.4), 


(9.10) o(a”, b”) = (a, b). 
Now suppose F and G are such G(x) = @(F(x)) and 


0°6(F) 
(9.11) —- 0 


for all F. Then as said in Section 7, (9.11) implies 


(9.12) d 0G/dx 


—_— — 0 
dx dF /dx 


for all x. Now (9.12) is the monotone likelihood condition in terms of Savage 
[3]. Thus it follows from Theorem 6.1 of Savage [3] that if the condition (9.12) 
or (9.11) above is satisfied, 


(9.13) P(a’) 2 P(a) 


for a and a’ in (9.1) and (9.2) assuming 7 < j. 

Now since b in (8.9) is uniquely determined by a, we may consider the statistic 
¢(a, b) in (8.14) as a function of a alone, ignoring b. Thus we can get a simple 
ordering of the vectors a, according to the values of ¢(a, b). Similarly for a 
simple alternative G = @(F) we can have a simple ordering of vectors a, ac- 
cording to the statistic P(a) in (3.7). Consider a fixed vector 


a= (+--+ ,aj,°°+,a@;,°°") 


having a; = a;, (7 < j) and let a’ be the vector obtained from a by replacing 
the ith and jth coordinates of a by a; + k and a; — k respectively. For the same 
i and j, allowing k to take values 1, 2, --- , a; , we get a set of vectors a’ which 
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we denote by S’. Then we have from (9.1), (9.2), (9.4), (9.9), (9.11) and 
(9.13) 

THEOREM 9. For any function @ satisfying (9.11), a simple ordering of S’ by 
the statistic P(a’) in (3.7) is identical with a simple ordering given by o(a’, b’) in 
(8.14). 

We shall have a similar theorem, if instead of (9.11) we have 


(9.14) aG/aF’ < 0. 


Theorem 9 above also establishes a relation between the ¢-statistic in (8.14) 
and the V-statistic defined in (4.5) for which Theorem 7 was true. 


10. Numerical illustration. In Table 3 we have given values of different 
statistics, for comparison. As already noted in Section 5, the ranking of the 


TABLE 3 


5 6 


t 


F, G Normal 





: ~| @stat. | W-stat. |\Smirnov 
(6.7) *  jequal var. 6 =| equal var. | equal means| (8.14) | (8.17) | stat. 

| 0.75 Rel. freq. | 6 | 
| |(Monte Carlo)! V-stat. (6.6) 


“<o 
V-stat. (6.12)| 


(3,0,0) | .0038 |—1.40 |—1.66 2.25 
(2,1,0) | . -0.97 |-1.16| 3.45 


(1,2,0) | . —0.54 |-0.66 | 4.45 
(2,0,1) | . \—0.43 |—0.50 45 


eRBS- 


_ 
_ 


(0, 3,0) | .0667 | 0.00| 0.00 | .10 
(1,0,2) | . | 0.43) 0.50} 11.15 
62. Bi... | 0.54 | 0.66 12.00 
(0,1,2) | . | 0.97] 1.16 18.45 
(0,0,3) | .6 1.40 | 1.68 27.30 











mococococooc 


1. 
0. 

0. 

: 0. 

G.80u) i \—0.00 |—0.00 | 40 | 00 | 0. 
1 

0 

0 

0 

1 


SRRES 


vectors a according to the probabilities P(a) in column 2 remains the same for 
all values of k > 1 in Savage’s Table [3]. Column 5 gives the relative frequencies 
in Teichroew’s experiments [4]. Again the ranking of a’s according to the relative 
frequencies, for all values of 6 considered in [4], remains nearly the same. 

The ranking of vectors a in column 1, by the V-statistic (for testing the vari- 
ance ratio) in column 7 agrees better with the ranking by the ¢ statistic in 
column 8 than with the rankings by the statistics in columns 9 and 10 respec- 
tively. It should be noted that the statistics X, C,; , and V in columns 3, 4 and 
6 respectively are meant to test one-sided alternatives. We can however con- 
struct, intuitively, two sided tests based on them, havjng the corresponding 
critical regions | X | > const., | Ci | > const. and | V | > const. Now in Table 
3, the ranking of the vectors a in column 1 by any of the statistics | X | , | Ci |, 
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and | V | agrees better with the ranking by the ¢ statistic in column 8 than with 
the rankings by the statistics in columns 9 or 10 respectively. Of course more 
empirical investigation is necessary to arrive at practically usable conclusions. 


11. Some possibilities for the asymptotic behavior of the V and ¢-statistics. 
This section consists of a few conjectures or guesses. From (4.5) we have, 


, “ l+at+se:-- +4; 
11.1 log V = log 6’ | ————_—_____- }. 
( ™ 2 ” ( m-+2 ) 
Now fixing the second sample y; , --- , yn, let the size m of the first sample go 
to «©. Then in view of (4.3), for both null and alternative hypotheses, with 
probability 1, 


(11.2) log V = z. log 0’(F(y;)). 
i 


Now the asymptotic normality of log V in (11.1) as n — & could possibly be 
derived from the fact that the F(y)’s in (11.2) are distributed identically and 
independently on both the null and alternative hypotheses. On the null hy- 
pothesis the F(y)’s are distributed rectangularly,0 < F < 1. On the alternative 
hypothesis, the frequency function of F(y) is #’(F),0 S F Ss 1. This also sug- 
gests that the mean values of the asymptotic distribution of log V in (11.1) 
might be 


1 


(11.3) n [ (log 6’(F)) dF, 


0 


(11.4) n | (log 6’(F))0’(F) dF 


0 


on the null and alternative hypotheses respectively. Similarly the variances, on 
the null and alternative hypotheses, could possibly be expressed as follows. 


1 1 2) 
(11.5) n {f (log 6’(F))? dF — | log 0’(F) ar | > 
40 0 


(11.6) nn {ff (log 6’(F))’0’(F) dF — If 
\ 40 0 


1 


(log 6’(F))@’(F) ar | 


The above integrals could be evaluated by means of numerical integration. 

Next it seems from (4.3) that when m — , n being fixed, the power of the 
V-test is equal to that of the corresponding optimum parametric test. In particu- 
larly van der Waerden’s X-test (6.7), which is equivalent to the corresponding 
V-test (6.6) as m — , has been proved in [5] to be asymptotically as powerful 
as the t-test. 

The asymptotic distribution of the ¢-statistic in (8.14) is difficult to guess. 
However ¢(a) in (8.3) seems to be relatively easy to handle. From (8.3) we 
have 
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n+l n+1 


(11.7) log¢(a) = log m! — mlogm + >> a;loga; — >> loga; !. 
1 1 


Now suppose a = (a@,°-*, @:,°** , @n41) in (11.7) is such that all the a,’s 
are large enough so that Stirling’s approximation can be applied to a; !,i = 
1,---,n + 1. Then from (11.7) we have 


n+l 
(11.8) log ¢(a) = const. — >> loga;. 
1 


Further (except for degenerate alternatives) all the a; will be large enough for 
Stirling’s approximation, with as large a probability as we may wish, if, fixing 
the second sample y , --- , Yn , We increase the size m of the first sample suffi- 
ciently. The asymptotic normality of the expressions in (11.7) and (11.8), 
ignoring constants, follows from a theorem due to Wolfowitz [6], under the 
condition m = n + 1. Otherwise the asymptotic normality of (11.8) is obtain- 
able from arguments similar to those in the preceding paragraph. 
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A NONPARAMETRIC TEST FOR THE PROBLEM OF 
SEVERAL SAMPLES 


By V. P. BHAPKAR 
University of North Carolina and University of Poona 


1. Summary. In this paper, a new nonparametric test for the problem of c 
samples is offered. It is based upon the numbers of c-plets that can be formed by 
choosing one observation from each sample such that the observation from the 
ith sample is the least, 7 = 1, 2, --- , c. The asymptotic distribution of the new 
test statistic is derived by an application of the extension of Hoeffding’s theorem 
[4] on U-statistics to the case of c samples. The asymptotic power and the 
asymptotic efficiencies of this test relative to the Kruskal-Wallis H-test [7] and 
the Mood-Brown M-test [10] are computed in standard fashion along the lines 
of Andrews’ paper [1]. 


2. Introduction. Let 2 , %i2, +--+ , “in; be independent (real-valued) obser- 
vations from the ith population with c.d.f. F; ,i = 1,2, --- , c, and suppose that 
these c samples are independent. The F’s are assumed to be continuous. We 
consider a certain nonparametric test for the hypothesis 


Ko: F,; = F, = i 


If we assume that the populations are approximately of the same form, in the 
sense that if they differ it is by a shift or translation, then we may say that we 
are testing for the equality of location parameters. References to prior work 
on several-sample tests and some of the recent work may be found in [2], [6], 
[7], [8], and [10]. 

Let v be the number of c-plets that can be formed by choosing one observa- 
tion from each sample such that the observation from the 7th sample is the least. 
Then 


ni 


(2.1) v® = > J] {number of z,, > 23; , 
j=l ri 
The new test-statistic proposed is 


i=] v=l1 


(22) V=N(2e —1) [> plu” —c')’ — \24 plu? — eh], 


where N = 2 ni, pi = n/N andu’” = v/(nyng --- n.). When the hypothesis 
Ko is true, it will be seen that the expectation of each u‘” is 1/c. Thus, V may 
be considered as a measure of deviation from Ky. The motivation behind the 
use of the v’s is simply to generalize, to the case of several samples, the Wilcoxon 
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[12] statistic for two samples (the number of times observations in the first 
sample are smaller than observations in the second sample). The test consists 
in rejecting Ko at a significance level a if V exceeds some predetermined number 
V.. In the next section it is shown that, when Ko is true, V is asymptotically 
distributed as a x’ variable with c — 1 degrees of freedom. Thus, a large sample 
approximation for V. is provided by the upper a-point of the x’ distribution 
with c — 1 degrees of freedom. It is conjectured that this approximation is 
relatively close even for samples of moderate size. 


3. The asymptotic distribution of V under Ky. It will be seen that 


ni ne 


(i) 


(3.1) v= 7 7. cee = ¢'” (a1, ¢ Baty 5: ** * 5 Beasys 


tj=1 tom t =l 
where 


i 
d (Lit, » Lee, os 


(3.2) if 2, <2, forall k = 1,---,c except: 
a otherwise. 


Thus, u‘” is a generalized U-statistic [11] corresponding to ¢°”. We shall make 
use of the following generalization of Hoeffding’s theorem [4] on U-statistics 
to the case of c samples: 

Lemma 3.1. Let Xi;, 7 = 1, 2,---, ni for a fixed i be independent (real or 
vector) random variables identically distributed with c.df.F;, i = 1, 2,---,¢. 
Further, let > a n; = N and 


(r) 


mj; 


c n -1 * 
Uy’ = II ( ) 2 o” (Xia, . es Xam"); 


i=l 
. - r , 
Xo; ; ++ Xo6q6r)5 2023 Beles ++, Xetn(r), . 1,2, °** 5G, 


where each @*”’ is a function symmetric in each set of its arguments and >-* denotes 
. ° / ww 2 
the sum over all combinations (a, +--+, am‘) of mj” integers chosen from 


(1, 2, +--+, m) and so on for B’s, --- , and &’s. Assume that &[¢] = 9‘ and 
Elo < «. Then 
(i) 6[{Ux’] = 9”, 


—} rs) rs) 
ec [ns . me 
(ii) Cov (US, UY] = 11 ( i mths ae 


i=] \m; 
¢ ( 
i=] 
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) 


(rs) ° (r (s) 
where m;"’ = min (m;", m;") and 


EB rr) , , , "aes . 
Sdi Ula cca a, (", 8) = Slo (Xu,-°:, Xia, » X14, 41 ’ X im} "ses 


+ r r r (s) r r 
Xa oe Xi, ’ X ed -+1 er oe X em‘) x @ Xu ents X 1d, ’ 


X im” 41 oo ee Xi m(” 4.m(* a,  oBieaey > mie Sa X ed, . 
r , (r)_ (8) 
X em.” —e”C:tC X em” +m_g.)]  -— - Ss 
r,s = 1,2,---, 4g, it being understood that r = s gives us Var (US), and 
(iii) N*{Uy — nl] is, in the limit as N > & in such a way that n; = Np, , the 
p’s being fixed numbers such that 2 pi = 1, normally distributed with zero mean 
and asymptotic covariance matrix & = (o;,) given by 


c (r) (3s) 
a 


mm , ; 
A) Crs = vs $0,-+-,0,1,0,-+-.0 (fF, @), 


i=l Pi (lat the ith place), 


, , y(1) , 1) (g) 
where Uy = (Uy’,--:, Uw) and w’ = (n°’,---, 7”). 


Proor. The proof of this lemma (concerning generalized U-statistics [11]) is 
a straightforward extension of the proof of Hoeffding’s theorem [4] on U-statis- 
tics, and the details are omitted. 

Now to apply the lemma to our problem, we note from (3.1) that 


(4) (a) 


Uu =v /MN2°** Ne, += 1,2,---,6¢, 


>“) 


are generalized U-statistics with g = c and mj” = m;” = --- = mS” = 1. Then 
if Ko is true, 7” = P[X; < X, fork = 1, --- ,c except k = i], where the X’s 
are independent and identically distributed random variables, and hence 


(3.5) $o,...,0,1,0,...,0 (2, 2) 
, (1 at the ith place) 


9 


= §[9'"(X1,°°:, Xi, °°: X.e"(X, Sec ee ae ct me 


’ 


where again the X’s and X”’s are independent and identically distributed random 
rariables, so that 


Ce scRe ne (i, 7) 
(1 at the ith place) 


(3.6) = P[X: < Xi, Xi < Xi k = 1,--+,cexceptk = i] —c” 
(ec —1)°. 
~ @(e — 1)’ 


6:5 BAD cd (i, 7) 
(1 at the jth place) 


= P[X; < X,, X; < X;,X; < X:forallk = 1,---,c 


9 


except 7 and alll = 1, ---,c except i and j] — c~ 


= [c’(2e — 1)}"; 
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and similarly, 


£o.---,0,1,0,+++,0(2, J) 
(tj) 


C~— fb “os : ‘ : 
38: =e if 1 is at the ith or the jth place in the row of 0’s 


[c’(2e — 1)]* otherwise. 


Thus, if Ko is true, from (3.4) we have 


1[ (e — 1)? 
(3.9) ois = [CP (2c — 1)] 1 : . J +> |. 


ki Pk 


and 


k Pk Pi Pi 


The above two relations give us 


(3.11) (2c —1)= = ( D4 1/pe) Jee + cD — cOJi. — cJeq’, 


€ 


where D = diagonal (1/p,, k = 1, 2,---,0¢), q’ = (1/p.,---, 1/p.) and 
Je = (1),,2- Hence from (iii) in Lemma 3.1 it follows that N*[U — J..:/c], 
where U’ = (u"’, --- , wu“), has a limiting normal distribution with zero means 
and asymptotic covariance matrix = given by (3.11). But Ps v = nyne--- na, 
and hence u’s are subject to one linear constraint, viz., D4 u” = 1. Thus the 
distribution of u’s is singular and hence the asymptotic distribution is also 
singular. Then = is singular; in fact it can be easily verified from (3.11) that 
ji.-& = 0. Let 


N*{U’ — Ji.-/ce] = b’ = (bi, --+, bea, be) = (bo, b). 


rex . > ——— a - os ’ . . 

Then it follows that boXo Do has a limiting x distribution with c — 1 degrees 
of freedom, where X» denotes the asymptotic covariance matrix of bo. From 
(3.11) we have 


(3.12) ce’ (2c¢ — | )Xo = aJe-1,c-1 +. cDy — C031 ,-—-1 - cJe—1,190 . 


where Dy = diagonal (1/p,, k = 1, 2,---,e¢ —1),qo = (1/m,°::, 1/pe-1) 
anda = > {.,1/p. 

Cask (i): = m = --: = n,.. Then p; = 1/cand (3.12) gives (2c — 1) Xo = 
cl — J.-1,--1 , so that 


Zo = (2c — 1/e)fl + Jerr], 
and hence, 


N (2¢ — 1) 


(3.13) bo Zo by = 
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Case (ii): Not all n’s are equal. Then qo and J._:,; are linearly independent 
and from (3.12) we have 


2/< ‘ 2 7" 
c (2c — | )JX0o =Cc Do _ EF cS 
where 


= = [cqo, Jel, F = [J-ua, cqo — aJea) 
are both of full rank viz., two. Then 
[c°(2e — 1)|"*So° = ¢ “Dy — Do’ EAF’D;’, 
where A is given by 
c(F’D,'E — cIjA =I. 


After simplification we finally have 


c 2 ( c 2 
(3.14) bo Zo bo = N(2c — 1) [> Di (u° — ‘) - \24 Di (u' -- ‘)} 
i=l c i=l c/) 


It may be seen that the above expression reduces to (3.13) when n, = m = 
- = n,.. It may be noted that the above expression is invariant under any 
choice of (c — 1) linearly independent u’s. We have thus proved Theorem 3.1. 
THeEoreM 3.1. If Fi) = F, = --- = F, and n; = Np», , where the p’s are fixed 
numbers such that >.; p; = 1, then the statistic V, defined by (2.2), has a limiting 
x distribution with c — 1 degrees of freedom as N > . 


4. Consistency of the V-test. As mentioned earlier, if we assume that the 
populations are approximately of the same form, then we may say that we are 
testing for the equality of location parameters. Thus, we are primarily interested 
in translation-type alternatives F;(x) = F(a — 6;), 7 = 1, 2,---,c, where 
the @’s are not all equal. We shall show that the V-test is consistent against 
this class of alternatives. 

We first state, without proof, the following straightforward extensions of a 
lemma of Lehmann ({9], p. 169). 

Lemma 4.1. Let » = f(F,, Fe,---, F.) be a real-valued function such that 


f(F, F, +--+, F) = mo for all (F, F,---, F) ina class Cy. Let 
Fane © ln, *** , Kee 3 *** Ray ED 


be a sequence of real-valued statistics such that T,,,,....n, tends to n in probability as 
min (m,-°-:, %) — ©. Suppose that f(F,, Fe,---, Fe) # mol >) for all 
(Fi, F2,--:, F.) in a class @,. Then the sequence of tests which reject when 

T'ny.--.%¢ — | > Cny,....0, (Weer T,, _ n,) 18 consistent for 
testing H: Co at every fixed level of significance against the alternatives C, . 

Lemma 4.2. Let n°” = f'" (Fi, F2,-+-, Fe), t = 1, 2, +--+, g, be real-valued 
functions such that f(F, F,--- ,F) = no” for all (F, F, --- , F) in aclass @o. 
at T%,....0, @ O"(Aa.... Km ** | Ka,-°* » Xea,); 


2 4.%,°**,4 8 


c 
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sequences of real-valued statistics such that T\) , --- , n- tends to n°” in probability 
as min (m,--- ,7-) —> ©. Suppose that at least one f‘’(F,, F2,---,F.) ¥ ns” 
for all (F,, Fo, --:,F.) ina class @, . Further, lei 


(1) E P (9) 
So w( Ts, ine o 9 Zw =e 


be a nonnegative function which is zero if, and only if, TY) ...n. = n° for all 
1, ---,g. Then the sequence of tests which reject when 


is consistent for testing H: Co at every fixed level of significance against the alterna- 
tives C, . 

If we take n‘” = P(X; < X; for all j = 1, --- , ¢ except i], where the X’s 
are independent random variables with continuous c.d.f. F,, F2,--- , F., re- 
spectively, and Ty)...n, = u’’, i = 1,---, ¢, then the convergence in prob- 
ability of u“” to 7°” follows from (iii) in Lemma 3.1. For the class @, of transla- 
tion-type alternatives F;(z) = F(a — 6;), where the 6’s are not all equal, it 
may be easily seen that n° > 1/c, where 3, is the (or one of the) least among 
6,,-°:,6,.. The V-test, thus, is seen to be consistent against the class of transla- 
tion-type alternatives. 

More generally, the V-test is consistent against the wider class of alternatives 
for which P[X; < X;forallj = 1, --- ,c except 7] ¥ 1/c for at least one i among 
(1,---, 0c), where the X’s are independent random variables with continuous 
c.df. F,, F2,---, F., respectively. 


5. The asymptotic distribution of V under translation-type alternatives. 
Andrews [1] has investigated the asymptotic efficiencies of Kruskal’s H-test and 
Mood’s M-test and has concluded that the asymptotic efficiency of one relative 
to the other is 2 or S 1, for the translation-type alternatives, depending on 
the distribution function. It will be interesting (as suggested by Hoeffding and 
the referee) to carry out similar studies on this test with respect to the two 
previous tests. It is expected that the same type of conclusien will be reached. 

Let us study the distribution of V, assuming a sequence of translation-type 
alternative hypotheses K, for n = 1, 2, --- . The hypothesis K, specifies that 
F(z) = F(x — n“0;),i = 1,2,---, c, where not all 6’s are equal. The letter 
n will be used to index a sequence of situations in which K, is the true hypothesis. 
The limiting probability distribution will then be found asn > o. 

TuHEeoreM 5.1. For each index n assume that n; = ns; , with 8; a positive integer 
and the truth of K, . 

If F possesses a continuous derivative f and there exists a function g such 
that 


f(y +h) — fly)\V/A\ S gly) 
and 


| g(y)fly) dy < 
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‘ce , cea 2 3x. ° ° e 

then, for n — ~, the statistic V has a limiting noncentral x° distribution with 
. 1 

c — 1 degrees of freedom and the noncentrality parameter 


(5.1) (2e — 1)c” >> s;(0; — 6)? il (1 — Fly) f7(y) ay | : 


t=] 


where 6 = >>; 8,0;/>.;8;. 
Proor: Let n°? = &{¢' (Xi, X2, --- , X-)| Kn]; then it can be easily shown 
that 


— i A+ O(n" 
a. 


he 
where 


6 = 06; — >> &, 
k=1 


and 


(5.2) A= (1 — F(y)I°F?(y) dy. 


Similarly, it may be shown that 


(5.3) zr, = 2+ O(n), 


where = is given by (3.11) and O(n™*) denotes a matrix whose elements are 
O(n). 

Then, in view of Lemma 3.1, N’(U — n,) is, in the limit asn > « , distributed 
with zero means and covariance matrix =, , or, in view of (5.3), with asymptotic 
covariance matrix &. Hence N'(U — c¢ J...) has a limiting normal distribution 
with mean-vector — ( Ps s.)', where 8 = (6; ,°*:, 6), and covariance 
matrix ©. Thus V, in the limit as n — ~, is distributed as a noncentral x” with 
ce — 1 degrees of freedom and the noncentrality parameter 


Ay = (Doz 8) d*B9Z0 So , 


in the notation of Section 3. Since >>, 6, = 0, arguing exactly as from (3.11) 
to (3.14), we see that Ay reduces to (5.1). 


6. Asymptotic relative efficiency. Andrews [1| has shown that the H-statistic, 
the M-statistic and the F-statistic are asymptotically distributed as noncentral 
x’ with ¢ — 1 degrees of freedom and noncentrality parameters \y, \™ and 
Ar , respectively, where 


Aw = 12 if F’ (zx) ar(2)\ > si(0; — 6)’, 


i= 


Aw = AF’ (a)? do 8:(0; — 6)’, 


t=] 


1 This was also obtained independently by Y. 8S. Sathe. 
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and 


he = Doi 8: (0; — 8)/orl’, 
where a is the median of F. 

It is now well known (({1], [3]) that in such cases the asymptotic efficiency of 
one statistic relative to the other is equal to the ratio of their noncentrality 
parameters. Hence, we have the asymptotic efficiencies of the V-statistic relative 
to the H, M and F statistics as follows: 


év.a = (2c — 1) ex24 | F' (a) ar(a)} ; 
évy.a = (2c — 1)cn’/4[F (a)f, 

and 
évy.r = (2c — 1)c Nor, 


respectively, where \ is given by (5.2). These expressions are seen to be in- 
dependent of the scale parameter. For the uniform distribution the efficiencies 
are given by 


év,n = €v,r = €v,u/3 = (2c — 1)c*/12(¢ — 1)’, 
so that we have 
Cc 2 3 4 5 10 00 
eva 100 0.94 104 1.17 132 195 o. 
For the exponential distribution, f(y) = e ’, 0 Y < ©, éyg = €y.u/3 = 
év,r/3 = (2c om 1) /3, sO that 
c 2 3 4 5 10 a 
eva 100 166 2.33 3.00 633 oa. 
For the normal distribution \ can be computed from the Table I given by 
Hojo [5] force S 13. We have 
c 2 3 4 5 6 7 8 10 12 13 
eve2 1.00 094 086 080 0.74 069 065 058 053 051, 
while €v.u = 3e€v.n/2 and €vy.ir = Sev.n/™. 
For the normal distribution, the asymptotic efficiency of the V-statistic rela- 


tive to the Kruskal-Wallis H-statistic tends to zero as the number of populations 


tends to infinity. I am thankful to the referee for supplying the following indica- 
tion of the proof. 


OUTLINE OF THE Proor. We must show that 


n! [ ((z)Ie'(z) dx +0 
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On integrating by parts, it is seen that 


x n 2 ‘ 1 
[ [b(x)]"o' (x) dx a 
1 1 
— n+tidh 


It is therefore enough to prove that 


[ a[o(x)]"" p(x) dx 


y” *(y) dy. 


1 
n' | a" '(x) dx > 0 


0 


We shall prove this using the fact that 


1 4 
” BD in eae ies 
l z"[log (x )]? da (. + :) ‘ 


It is easily seen by de |’Hospital’s rule that 


fa TE ies as zr—l. 
flog @=-) 5 nee 


Given any e > 0 there exists therefore a constant a(} < a < 1) such that 
@'(r) < ellog (x "y] : for a<sz <i, 


and hence 


1 , j 2x_\! 
l a"b (x) dx < | z'llog (2 WV & & (3 ) 


ae it 4 » aon = » 

Finally it is easily seen that for fixed a fj x” &™ (x)dz tends to 0 at a faster rate 
1 n =| 1: : 

than fa x" & (x)dx asn— ». Given ¢ and hence a, there therefore exist mo so 

that n 2 mo implies 


1 1 9 4 
| 2"@ (x) dx < 2 | x"® (x) dx < 2«( mad ) 
0 a n+1 


and this completes the proof. 
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DISTRIBUTION OF THE ANDERSON-DARLING STATISTIC 


By Peter A. W. Lewis 
IBM Watson Laboratory, New York! 


In [1] and [2] Anderson and Darling proposed the use of the statistic 


(1) 


" f° 1G,(2) — G(x)f 
W.= n| IG. (x) G(x)} dG(x) 


-« G(x){l — G(z)| 


for testing the hypothesis that a sample of size n has been drawn from a popula- 
tion with a specified continuous cumulative distribution function G(x). In (1) 
G, (a) is the empirical distribution function defined on the sample of size n. 

We consider here the problem of determining and tabulating the distribution 
function, F(z: n) = Pr{W? < 2}, of this statistic. In [1], the asymptotic dis- 
tribution of this statistic under the null hypothesis was derived and, rewritten 
in a form convenient for computation, it is given by 

F(z; «) lim Pr {W,, S 2} 
n+ ® 
«2 a 
1 ° 2 
>. a;(zb;)* exp [—b;/z] | fi(y) exp |—y'] dy, 
j=0 0 
where 
= exp [$2b;/(yz + b,)], 
(3) ( 1)?(2)*(43 ats 1 
2)°(47 + 1)TQ) + 9) . 2 2 
~ i bj = § (47+ 1)’r. 


Using the calculated values of the a,;’s and b;’s, and the fact that 


a2 
[ filyle” dy S 3 (x)’ exp [2/8], 
Jo 


it can be determined that no more than two terms of the sum (j = 0, 1) are 
needed toevaluate F(z; ~ ) tofive decimal places over the range of z which is of 
interest. This range is 0 < z S 8, since for all n, F(8;n) = 1.000, rounded to 
three decimal places. The integral in each term of the sum was evaluated numeri- 
cally using Hermite-Gauss quadrature numerical-integration formulas (p. 327 of 
[3], [4]). This method of numerical integration is very efficient in terms of com- 
puting time and gives sufficient accuracy to determine F(z; ~ ) to five decimal 
places. 

The results of these calculations of F(z; ~ ), rounded to four decimal places, 

Received November 23, 1960; revised June 13, 1961. 

1 Present address: IBM Research Laboratory, Monterey and Cottle Roads, San Jose 14, 
California. 


1118 





ANDERSON-DARLING STATISTIC 1119 


are given in the last column of Table I which appears at the end of this article. 
The asymptotic significance points given in [2] were verified and are shown in 
Table IT. 

An equivalent form of the statistic W%, is given by 


W,= 


—n — (n) > [(2i — 1)InG(X) + (2(m — 1) + DI — G(Xw~))] 


’ 
t=] 


where the X,;) are the order statistics of a sample of size n. It is well known that, 
under the null-hypothesis, the transformation G(X,,;) takes the X,;) into the 
order statistics U,;, of a sample of size n from a population with the uniform 
(0, 1) distribution, giving 


—n — (n™"') = [((2¢ — 1)In Ugy + (2(m — 1) + 1I)In (1— U,,)). 
i=] 


This shows clearly the distribution-free property of this statistic under the null 

hypothesis, and allows us to determine very simply that, for any n, the mini- 
° : 2 * 

mum value which the random variable W;, can attain is 


n eae 21-1 /o “a 2(n—4) +1 
(6) 2z(min) = —n — » In | (==) (2 d+t) | i 
N inl 2n 2n 


These values are tabulated in Table II (following Table I), in the row entitled 
F(z) = 0. 
From equation (5), we find that 


(7) Wi = —1 —In[U(1 — U)), 


where U is uniform (0, 1), so that 


(8) 2, _ {(1 — 4exp[—(z + 1)))', 38629 
7 fire ee ; .38629 


Values of F(z, 1), rounded to three decimal places, are given in Table I. 

For n 2 2 and finite, resort was had to synthetic sampling (Monte Carlo) 
methods on an IBM 704 Computer to determine the distribution function 
F(z:n) = Pr{W%, < 2}. This is done, using equation (5), by artificially generat- 
ing m samples of size n from a uniform distribution. The result of this process 
is an empirical distribution function, F,,(z; n), which is used as an estimate of 
F(z;n). The F,,(z; n) are tabulated in Table I for n = 2 up ton = 8. 

It is necessary to make a determination of the accuracy of these estimates 
F,,(z; n) of F(z; n) for a given m. This can be done in either of two ways, as 
follows: 

(1). For very large m, mF,,(z; n) is approximately normally distributed, 
with mean mF(z; 7) and variance mF(z; m) (1 — F(z; m)). Therefore a con- 
fidence interval with confidence coefficient 1 — a for F(z; n), at any point z 


=) 
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is given by 


(9) {Fy,(2;n) + fa2({F(z)(1 — F(z))]/m)*}. 


In this expression {./2 is the upper —a/2 point of the N(0, 1) distribution. In 
most tables the “error” estimate used is essentially the above confidence interval 
with confidence coefficient 0.6868, i.e., fa. = 1. Now F(z; n) is unknown, but 
F(z;n) (1 — F(z;n)) is maximum at F(z;n) = 0.5, so that to keep this “error” 
less than or equal to .0005 over the range of z, one requires an m = 10°. 

(2). Another means of evaluating the error is by using the Kolmogorov- 
Smirnov statistic, from which, for large m, we can say that we are 95 percent 
sure that F,,(z; n) will stay within 1.36( m)~ of the true distribution F(z; n) 
for all z, i.e., over the entire distribution. Therefore to make this statement for 
a deviation of .0005, we need m = 7.398 X 10°. 

Unfortunately the time available for computation limited the value of m 
used in these computations to m = 10° for n = 2, and to m = .25 X 10° for 
n = 3, 4, 5, 6, 7, and 8. Thus, using the Kolmogorov-Smirnov criterion, the 
values F,,,(z; 2) given in Table I are within .00163 of F(z, 2) with probability 
0.95, and for n = 3, 4, 5, 6, 7, and 8 the values F,,(z, n) are within .00326 of 
F(z; n) with probability 0.95. 

Determination of the distribution of W%, for n > 8 by Monte Carlo methods 
is prohibitive, since for n = 8 and m = 250,000, six hours of computing time 
were required. This is quite indicative of the inefficiency and impracticability 
of simple Monte Carlo methods as a means of solving distribution theory prob- 
lems when the entire distribution function is required with great accuracy. 
Furthermore, it is doubtful whether modified Monte Carlo methods ([5], [6], 
[7]) could be used to advantage here. 

Fortunately the convergence of the distribution of W%, to its asymptotic dis- 
tribution is quite rapid. Thus, from Table I the maximum deviation at the tabu- 
lated points between the asymptotic distribution and the distribution for n = 8 
is approximately 0.006. For F(z) = 0.8, which is of most interest, this difference 
is only 0.001, so that for practical purposes the asymptotic distribution can be 
used for n > 8. 

Significance points for W%, are given in Table II, for significance levels 0.100, 
0.050, 0.010. Forn = 1 andn — these values are exact; the others are obtained 
by inverse interpolation from Table I and are only approximate. 


Acknowledgments. I wish to thank Mr. H. Serenson for his help in program- 
ming this problem. 
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TABLE I - Values of F(z; n) - Exact for n = 1 and n-pe0; Estimated for n = 2, 3, 4,5, 6,7, and 8 


F(z; n) 
4 5 


ee 


esssesesesssesssssos 


. 


+ 
. 


. 


. 
. 


. 


. 
. 


+ 
. 


. 
. 


. 


esesseeoesesseesesssoss9seeeessessee929eeeee999999°9 
eeseseeseeosssss 


0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0 

0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 
0. 


— 
_ 
to 
bo 





TABLE I (Continued) 
F(z; n) 
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0.927 
0.931 
0.935 
0.939 
0. 942 
0. 945 
0.949 
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TABLE II - Significance Points and Values of z(min. ) 


z=F! ( F(z) 


0.2493 0.1885 0.1533 0.1304 0.1135 0.1043 0.0911 0 
1.98 ° 1.94 1.94 1.933 
2.52 2. 492 


3.95 3. 857 





ERRORS IN DISCRIMINATION 
By 8S. JoxHn 


Indian Statistical Institute, Calcutta 


Summary. The probabilities of misclassification involved in the use of esti- 
mated discriminant functions are subject to chance variations. The author’s 
purpose in this paper is to derive the distribution laws that the probabilities of 
misclassification follow and to obtain their expected values. The parent popula- 
tions are assumed to be normal. The first part of the paper considers the uni- 
variate case and the second part the multivariate case. The discussion of the 
multivariate case proceeds in three stages of increasing complexity. When the 
exact results are complicated, asymptotic results or approximations are given. 
Finally, the problem of estimating the expected probabilities of misclassification 
is considered. Interval estimates as well as point estimates are given. 


1. Introduction. Multivariate statistical methods have been found extremely 
useful in devising efficient procecures for the solution of taxonomic problems. 
About twenty-five years ago Sir Ronald A. Fisher was consulted by M. M. 
Barnard as to the best method of classifying skeletal remains unearthed by 
archaeological excavations. Fisher suggested the use of the now well-known 
discriminant function [4], [7]. A general mathematical theory of statistical 
taxonomy was built by Welch [23] on foundations laid by Neyman and Pearson’s 
theory of tests of hypotheses. Subsequent authors introduced many refinements. 
For a fairly complete account of the theory as it has developed during these 
years see chapter six of [3] or chapter eight of [18] and literature cited therein. 

The situation we are considering is the following: We have an individual who 
has come from one of the two populations P™, P®, but from which one is not 
known. It is required to devise a procedure that ensures a high probability of a 
correct classification of the individual. To come to a decision various charac- 
teristics of the individual are measured. Suppose we have measurements on p 
characteristics. Let the vector of measurements be x = (2, %2,°** , Zp). Let 
the distribution of these measurements in P“™ have yp“ as its mean vector. 
Assume that the dispersion matrix is the same in both the populations. Denote 
this common dispersion matrix by =. The discriminant function is then the 
linear function (u® — p”)= ‘x’. We shall set 


(1) D(x: vw”, y”’; Zz) init (y” a yp” )= x’. 


The procedure usually adopted’ is to classify the individual as belonging to 
P™ or P® according as 


(2) D(x; wv”, wu”; >) $ Dla” + vw”; -". yp”; z) 
The above procedure is possible only if y“’(k = 1, 2) and = are known. But 


1 Certain situations require a slightly modified procedure. See Section 12. 
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usually such is not the case. We may then try to estimate the unknown param- 
eters uw, w® and = from random samples from P” and P™, substitute these 
estimates in the appropriate places and use the resulting function to classify 
individuals in exactly the same way as D(x; pw’; uw”; Z) is used. 

Let x$” (i = 1,2,---, p;r = 1,2,---,N,) be a random sample of size N, 


from population P*”. Put 


(6) Ss = 


The vector x” is an estimate of uy“ and the p X p matrix S is an estimate of 
the dispersion matrix &. Substituting these estimates in D(x; * 9 ue” - x) we 
get (x”’ x”)S ‘x’. Using this function we may assign individuals to P” 
or P™ according as 


(9 


(7) (x? — x) S "x S a(x — zs (z? + 2%)’. 

In any classification procedure there are chances for two kinds of errors: (1) 
we may classify an individual from P™ as belonging to P®; (2) we may classify 
an individual from P™ as belonging to P™. It is clear that if an individual is 
assigned to P” or P® depending on the value of a linear function >> c;x; , these 
two chances will depend on the particular coefficients c; used. Now, in 
(x — #)S"'x’, the coefficients of 2; , %2,°** , Lp are respectively the com- 
ponents of the vector (x® — )S~'. These components are random variables. 
Random fluctuations in the coefficients induce random fluctuations in the chances 
of committing either kind of error and it is of interest to study these random 
fluctuations. This is what we do in the present paper. We assume P“”’(k = 1, 2) 
to be normal. 

Wald’s paper [22] appears to be the earliest one to discuss problems connected 
with the classification of an individual to P™ or P®, when the distributions 
of the characteristics in P“ and P® are not completely known. He considers 
the use of the statistic (x 


x)S"x’. Wald had visualized a way of using 
this statistic slightly different from the one which we described. He required 
the distribution of (x® — x”)S™'x’ to set up the classification procedure. Papers 
{2}, [8], [9], [10], [19] and [22] are partly or wholly concerned with the derivation 
of this distribution.’ In [2], [10], [17] and [19], other statistics which can be used 
similarly are considered. 


2 A referee informs the author that Elfving has given an expansion for the unconditional 
probability in the univariate case and that Bowker and Sitgreaves have given an asymptotic 
expansion for the distribution function of the classification statistic when all parameters 
are estimated, in papers written for a forthcoming publication, Mathematical Studies in 
Item Selection and Classification, to be published by the Stanford University Press. 
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2. Notation. Besides the symbols already introduced in the introduction, we 
use other symbols also. We shall here explain the manner in which these symbols 
are to be construed. 

To distinguish vectors and matrices from scalars we shall employ small bold 
face type to denote row vectors and capital bold face letters to denote matrices. 
The same letters, when primed, stand for the transposes of the vectors 
or matrices. 

The letter I will denote the identity matrix of order p. If u = (m4 , ue, -++ , Up), 
we shall set 


(8) du = du due --+ dup. 


The symbol g(x) will denote the standard normal density. The integral of g(x) 
from — « to x will be denoted by G(x). The function inverse to G(x) will get 
the symbol G(x). We define 


G(x, 9 72 5 p) 


( aoe 2\-4 Zz z2 a = 
U £2. | exp [—3(1 — p') (ui — 2puy uw. + ud)] dy du . 


2n 


(9) 


The symbol J,(p, q) will stand for the incomplete beta function, 


(10) rp + @) u’*(1 — u)*" du. 


T(p)r@) Jo 
Finally, we set 


9 


(11) 3 = (y” 4 y?) =p? a yp)! 


Besides the symbols introduced in this section, we use others locally. They 
will be explained at the appropriate places. 


DISCRIMINATION USING A SINGLE CHARACTERISTIC 


3. Introduction to univariate case. In the univariate case we shall for con- 
. . k) (k -(k) -(k . 
venience write un” for uy”, @“” for #"", and z for 2 . 


It is easy to see that the general classifivation procedure described in the 


. ° . ° ° ° (2 -(1 . 
introduction reduces in the univariate case to the following: If 2” > #”, assign 


a (1) 2) . Ss re(t =(2)) / =(2 =(1 . 
the individual to P® or P® according as x = [#” + #”)/2. If 2” < 2”, assign 


= (1) 


re (1) (2) . a =(2)7 
the individual to P” or P® according as xz < [# + #]/2. 
‘ = (1) =(2 . r =(1) —(2) sys 
Suppose #"” and @” are given. We shall denote by e2(#”, ”) the conditional 
eae . . ° a a ° (1) (2 =(1) =(2 
probability of assigning an individual from P™ to P® and by ex(#”, 2”) the 


eae eae ea ° — 2 ) 

conditional probability of assigning an individual from P® to P™. 

(4 ae [o? yp Zz" on uw?) if gz” < zg” 
= 


G([o }"'E 5 — p”)) if gz” z”. 


1) (2) 


(12) C2 (FE 9a )= 


Here o”’ denotes the standard deviation of the characteristic under considera- 
° ° . k) a ° -(1) —(2) ° 
tion, in population P“’. A similar equation for e2(#”, ”) can be written down 
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at once. We shall obtain the distribution and expected value of en(#”, 2”). 
A discussion of en(#@”, 2”) would be completely analogous. 
The classification procedure we have described is usually adopted only if 
® = o”. However, in obtaining the distribution and expected value of 
ew(z”, 2”) we shall not assume that o”’ = o”; there is some interest in study- 
ing the chances of errors under the more pitiacter set-up, since, although the 
classification procedure was designed on the assumption that ¢«” = o”, there is 


a possibility that the assumption was false. 


4. The distribution of e.(#, @”). The quantity e2(#, #”) can be less than 
z if and only if either of the following two events happen: 


ai = (2) =(2 ps 
zg” < Zz! and Mz = (1) +} 2” a a” > —¢g"G ‘(z) 


zg” > zg” and Ag 4 z”| a a < 0G "(z). 


The distribution function of e2(z", ®) is therefore given by the equation 


(13) Pr (e2(@”, 2”) < z) = G(hu, har 3 p) + G(his, hee ; p), 
where 

(14) hn = (Ny [oP + No"fo?P) (un? — w®) = —he, 

(15) hn = (Ny'[oP + We'te™#)aeE*(4) +e? — 2"), 
(16) he = (Ni [of + Na lo P) 20% (2) — wp? + w™), 
and 

(17) p = (Ny [oF + Na'[o®P) "(Na lo P — Ni'[of). 


The expression on the right hand side in equation (13) can be evaluated using 
the tables of G(a, , x2 ; p) given in [15]. If Ny’[e ? = Nz'[oef, 


(18) Pr (e2(@, 2” z) = G(hy)G(ha) + G(hyw2)G(he), 
and hence can be evaluated with the help of the tables of G() given in [14]. 


5. Expected value of en(#”, #”). The expected value of e2(z”, @”) can 
be calculated from the equation 


(19) Eey(é”, ~”) = G(au » Aa; p) + G(ayp , A: 
where 
van — (Ny [of + N3'fo?f)- dt a 
(20) ay 7) 1/ar— (72 7-1 
am = HH [oP + NT [of + Ne'[o? Pf) 
and 


N;! oP +] slo) 
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(21) oP + ANT oP + Nelo P)} > 
(Nz [oP Fer Nao). 


If Nilo? = Nolo), Ee.(z”, #°) can be evaluated using only tables of 
G(x); for, in this case, 


(22) Ee.(#, #”) = G(ay)G(an) + G(ay2)G (a2). 


iquation (19) is easily established if we observe that a wrong assignment of 
an individual from P“’ corresponds to the occurrence of either of the following 
two events: 


<(1) =(2) 


ne and > 4fz” + 2°] 


zg? > x Ae + 2”). 


The reader may wish to compare our treatment of the univariate case with that 
of [11]. 


DISCRIMINATION USING MORE THAN ONE CHARACTERISTIC 


6. Introduction to the multivariate case. We now take up for consideration 
the multivariate case. The procedure discussed is the one described in the intro- 
duction. It is an adaptation of the standard discriminant function analysis to 
situations where the parameters required for the construction of the discriminant 
function are unknown. 

Classification procedures based on the correct discriminant function are known 
to be the best possible when the distributions in the two populations are multi- 
variate normal with identical dispersion matrices. We shall, throughout our 
discussion, assume that the distributions in the two populations, do, in fact, 
satisfy these conditions. 

The discussion will proceed in several stages. We shall, at stage number one, 
assume that only yp” is unknown. The case where only y” is unknown is com- 
letely analogous and does not require separate consideration. At stage two we 
shall only assume that the dispersion matrix = is known. In the third stage we 
shall not assume that p“’, yw”, or = are known. 


7. Case one: only »” is unknown. Before starting discussion of this case let 


us note that we shall not err seriously if we take #“” to be the true value of 
and S$ to be the true value of ©, provided N, is sufficiently large. 

For constructing the discriminant function, y® has to be estimated. Substi- 
tuting #” for y® we have the discriminant function, 


(23) D(x; y”, 2”; =) = (29 — y”)= 


° i . ° ° 1) (2 . 
An individual with measurements x is assigned to P™ or P® according as 


(24) D(x: y”, x”. >) > Dla” - x”); wy”, x”: Z). 
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7.1. Distribution of e(%”). Given x”, the probability of misclassifying an in- 
oe . a. 2 
dividual from P*” is 1 — G(y) where 


1 


( y = 32° — yp”) = (2 — wu?) 


y ° ene — (2) ‘ a(2). « ° 

We shall denote this probability by e:2.(x*’). Clearly, e2.(%*) is a random vari- 
° ° =(2) m . ° ° ° =(2)\ « ° 

able since it depends on x‘’. The distribution function of e2.(x” ) is given by the 

equation 


(26) Pr (e2(%”) < z) = Pr(4Ny’ > 4N,/G"(z)?) (0 Sz S 3). 


Now, 4N.y’ is a noncentral chisquare variable with p degrees of freedom and non- 
centrality’ equal to (N28) /2. Pr (en(%”) < z) can therefore be determined from 
tables of the noncentral chisquare distribution‘. 

It is interesting to note that p”’, u” and & enter into the distribution of 
€2(x”) only in the form of 6. For any given z, Pr (e2(x%”) < z) is a monotonic 
function of 6 and therefore can be asserted to lie between certain bounds pro- 
vided we know upper and lower bounds for 6. 

7.2. Expected value of e(x*). From the preceding section we see that 


(27) éun(2”) = 1 — Gi 1[v/N2]’) 
where 


(28) v= N(x” = n° > i & — wu)’. 


The random variable v has the density function 


1(9 


(29) Qe 


4 (40)’ 


0 bet > IP(Ap + 1r)] 
r=( 


where 
(30) 


Therefore, 


Eeyo(x wd 


| g(x)}dzx > dv, 
\%h(v/No)* ) 


I I.(3p + 1, 4) 


where a = 4N2/(1 + 4N2). 


* Some authors use the term ‘‘noncentrality’’ for twice this number. 

‘Tables now available are not exactly in the form we require. Editors of [14] have an- 
nounced that tables of the probability integral of the noncentral chisquare distribution 
are among the tables considered for inclusion in Vol. II. For the present, recourse must be 
had to approximate methods developed in [1] and [13]. 
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The justification for the last step is the fact that 


d 


(32) Ir(4p +r) 20°" [ pPtr-t te av | g(x) dx 
0 } 


(v/Noq)4 


is equal to half the probability that a random variable having the F-distribution 
with degrees of freedom one and p + 27 takes a value greater than 
(4N2) ‘(p + 2r). 

It is possible to give several other expressions for Ee.(x”’); the one we have 
given above appeared to be the most convenient. 

7.3. Distribution of éo(x”). Thus far we have been discussing the chances of 
wrongly assigning an individual from P“”’ to P. We now take up consideration 
of the probability of wrongly assigning an individual from P® to P™. 

Given x”, the probability of misclassifying an individual from P® is G(w) 
where 


) — — (2) ) ah (2 - — (2) ) 
= lig a yu" )’] TC Poe vy )z 17g? ate yu" )’]. 
This probability we denote by é2;(x =>. Obviously é,(%”) is a random variable. 
We shall derive its distribution. 

The distribution function of e2,(%) is given by the equation 


(34) Pr (€(%”) < z) = Pr(w < @‘(z)). 
This equation shows that it suffices to derive the distribution of w. 
Observe that w is a function of 


9) 


(x? — y”)=" (2° a and 
Set 
(35) 
and 


(36) tb = (py? — y”)= *(x* 


Without loss of generality we may assume that y” = 0 and = = I. The density 


) 


function of %” is then 


cs ah A 2r)*” exp [(—3N2(2” — y®)(2® — y" 
(37) : 
= (N2/2x)*? exp {[—3 Jo(t, — 2t. + 6 )}. 


Therefore, if we denote by f(t , t2) the joint density function of 4 and &, , 
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f(t; , te) dt; dt, = / the / (N2/2m)*” exp[—4No(ti — 2t + &°)]dx™, 


ty<a(2)x(2)’<ty4dty 
to<pl2)2(2)’ <totdts 


(Ne 2x)” exp [—4N2(t ~ 2t. + &)] / idl / dx” 


t1<a (2)g(2)’<ti4dty 
te<p(2)2(2)’ <tot+dte 


(N2/2)*’x [er (4(p — 1))}* 
‘lt, — (4/8)? exp [—43N2(t — 2t + 8°)] dt, dtr, 
since [21] 


/ dz” = fF? IP GA(p — DP) sh — (4/8)? dt; dtp 

(39) ty<@ (2)¢2)'’ <t,4+dt, 

to<pl2)z(2)’ <tot+dte 

Substituting 

(40) ; » = Hi — f'b, 
we find that the joint density function of u and w is 
(41) Cu? [1 — &?(w — 4u)*}?® exp [—4N2(2uw + 8°)| 
where 


(42) C = 24. 5r(4(p — 1))) NP. 


Integrating out u we obtain as the density function of w the function 


( 4(N 252 — 1 2)4(p—3 

* (N ) 2 —3) —Not D ° 

oo [ u” [1 —6 “(w— hu)? edu if w2 5, 
2(w—3) 


2(w+-5) 


= ~}(N 252) —1 2 2)4(p—3) —Nquw 
Ces [ u” {Ll — 6 “(w — 4u) eo ee dy 
0 


if -+<swsi, 

\0 if w< —6. 

If p is odd, the expression within square brackets in the integrand may be ex- 
panded and each term integrated by parts. For example, if p = 3, 


(2N2)*x *(5w) te 2 [e192 — 8)? 
+ 2(N.w)"(w — 6) + (Nww)} 
— @ Mawes 2(w + 8)? 
; + 2(Now)"(w + 6) + (New) 4] if w26 
or ee (2N.)'x *( dw) “le 129) Now)? — @ rerete( wy + 8)* 
+ 2(N.w) (w+ 6) + (Nww)”}] if —é < w <b butw +0 
2a tN be Hs? if w= 0, 


10 if w< ~6. 
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For even values of p either recourse must be had to numerical integration or 
percentage points must be obtained by interpolation from corresponding per- 
centage points for distributions with p an odd integer. 

Here we observe that w > —é6 with probability one. Therefore, with proba- 
bility one 


(45) €n(z%”) > G(—S). 


From equations (27) and (28) we see that e.(%”), on the other hand, can go 
down even to zero. 

Observe that besides N2 and p, 6 is the only parameter entering into h(w). 

7.4. Asymptotic distribution of ex(%°). Since the exact distribution of e(%” ) 
is somewhat complics “ee it may be useful to note that, as Nz — o, the dis- 
tribution of 2N}[g(6/2)|fen(#”) — G(—8/2)] tends (weakly) to the normal 
distribution with mean zero and variance unity. 

Perhaps it is better to use the asymptotic distribution of w together with 
equation (34). The limiting distribution of 2Ni(w + 45) is normal with mean 
zero and unit variance. Hence we have, using equation (34), 


(46) Pr (en(%”) < 2) & G(2NYG@"(z) + 48). 


8. Case two: only = is known. 

8.1. Introduction to Case two. Since pw 
struct the discriminant function using #” 
function is 


1 2 
® and y”® are unknown we shall con- 


and x”. The resulting discriminant 


(47) D(x; . x”. x) i (x ae x” )="x’. 


- iat . . 4 68 ‘ anal (1) (2) 
The classification procedure consists in assigning individuals to P*’ or P*” ac- 
cording as 


(48) D(x; x .s)S DG (x? a x”): x” 2-3), 


Given #” and #”, the probability of misclassifying an individual from P“ is 


1 — G(u,) where 
uy = 3{(2? — 2) =z — 2) 


(49) *) a 2 * - 
4. (x bs ae i x”) (zx a x”?)=- (x x aN vy)’. 


We shall denote this by e2(z, #”). Similarly let e,(z™, #®) denote the con- 
ditional probability of minclansiivinn an individual! from pe ’. Being functions of 
random variables, e2(%”, #®) and e,(%”, #®) are themselves random vari- 
ables. We shall obtain the distribution of e2.(%°, #°). Since we are free to regard 
either of the two populations as P™ it is not necessary to consider e,(%", # ) 
separately. 


8.2. The distribution of e2(Z - x” 


). Since 


(50) Pr banat sil <z) = Pr(m > —@"(z)), 


- x” 


the distribution function of e..(x ) will be determined when the distribution 
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of wu; is obtained. In deriving the distribution of u, we shall assume that = = I 
and yw” = 0. Clearly, there is no loss of generality in doing so. 


From equation (49) we observe that u is a function of 2” — #” and x”. 
4 (2) a) (1) 
The joint distribution of x“ — X*’ and x” is multivariate normal with means 


(yu, 0) and variance-covariance matrix 
(Ni + NYE —Ni'I 
(51) ‘ ‘ 
—NyI Ny I 


° e ° a(l : =(2 = (1 
Therefore, the distribution of #, given #° — x' 


with mean vector 


) . . . 
= y, is multivariate normal 


—(N, + N.)N2(y ae yu”) 


and dispersion matrix (N; + N:) “I. It follows, therefore, that, given 
x? ~<% = y, % has the normal distribution with mean 


4(Ny + No) (Ni — N2)(yy’)' + (Ni + No) *N2( wy’) (yy’) 


(52) -1 4 = 4 \ 
= $(N, + Nz) (N, —_ N2)t3 + (N, > N2) Nots ls (say ) 


and variance (N; + N.)~*. Hence, if h(u) denotes the density function of u; and 
f(ts , ts) the joint density function of ¢; and & , we have the equation 


} T 
h( mw) = I (tm) exp| Mit ™ 


(53) Ni—N N F 
° ay eh. . 3 , 
(u 2 Ni z N: ts Ni + N; ts u) [10 ’ ts) dt, dt, 


where the region of integration is the entire domain of variation of ¢; and t,. 
It is thus necessary to obtain the joint density function f(t; , t) of ts and & . 
This is done in the next section. 

8.3. The joint density function of t; and t, . 


(54) ts at oe Lb x) (x =a x”)! ‘a yy’; 
(55) ty = yw? (zx EA x)’ al yy’. 


° ° ° ° ° . e 2) ° 

The distribution of y is multivariate normal with mean yp” and variance- 
. : r-1 --1 

covariance matrix (Nj; + N23 )I. Therefore, 


anal hm i f 
(t ! lt, = (2 P 5 sad 
f( 3» ty) dts dt, ( ™) (w :_) 


ta<yy’<ts+dt, 
ty<plDy’<tgtdt, 


NiN2 " @) | 
its | —~ nm . a ke ‘id 
exp | 2(Ni + No) (y—u')(y — uw)’ | dy 
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—p/2 N,i N2 ip 
a 9 P/ ae 
(ae) : + -) 


a ( — 24+ #) | x [--] dy 


t3<yy'<ts+dt, 
ty<pl?y<tgtdt, 


N, No 2 
“eX — ° — 
ep | by; (te — Ba +8 | 


4. p—1) 
- Pp 


ri(3(p — 1)) 
8.4. The distribution of u, when N, = N;. If the two sample sizes N, and N, 
are equal to N (say), the distribution of wu, takes a simpler form. In this case, 


. —(2 —(1 “,* . . . . . 
given x” — x = y, the conditional distribution of u is normal with mean 


x 5 ts — (4/8)? dty dty. 


(57) 1t5't, = 48t (say) 


and variance (2N)~*. Therefore, if f(t) is the density function of t, we may write 


1 
(58) h(w) = | (N/x)* exp [—N(u — 48t)*]f(t) dt. 
Ly 


It is now necessary to obtain the density function of t. For this we have only 
to use the joint density function f(t; , t,) of ts and & to find the joint density of 
t; and ¢ and integrate out ¢; from this joint density. The resulting expression for 
f(t) is given by the equation 


(59) f(t) = «1 G(p — 1))e PC — PY > Ate N* (at)’. 
r=() : 

8.5. An alternative derivation of the distribution of uw, in the case Ny = Nz. In 
the general case we gave the distribution of u, as a double integral. When 
N, = Nz, we are able to give the density function of uw as a single integral. In 
this case it is also possible to give the density function of u, in a more explicit 
form using a different method. 

Let ¢u,(@) be the characteristic function of u;. We have seen that, given 
x — x” = y, uw has the normal distribution with mean (ét/2) and variance 
(2N)~*. Therefore, the conditional characteristic function is 


(60) exp [37810 — (4N)~’6’]. 


Hence we have 


(61) atin [ exp [41800 — (4N)“#If(t) dt. 
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For the sake of convenience we now change over from u, to the variable »; de- 
fined by the equation 
(62) v, = (2N)'w . 
Let ¢»,(@) denote the characteristic function of v, . Then, 


(0) = gu, ([2N]'0), 


1 
he exp [i(N/2)*ste — 6/2] f(t) dt, 


(4[p — 1])}~* exp (—1Ne — 36) 


1 
: a 3! Ip + + r}) *(t) ‘ (1 — P)? “B) /2 iw /2) 400 dt. 
1 


r=( r! 


Expanding the exponential factor of the integrand and integrating term by 
term we obtain the following equation for ¢,, (6): 


} 


x exp (—43N8 — 40°) >’ r(3lp + r))T(3lr + m + 1) 


r({p + r + m)) 
(Ne) “(-w)"" 
r!m! wey 


where >.’ denotes summation over all non-negative integral values of r and m 
such that r + m is an even integer. 

The inversion formula for characteristic functions now readily yields an ex- 
pression for the density function of v, . This expression is 


+. -N82/4—v" /2 > r(3[p + r)) (alr + m + 1)) 
P(a{[p + r + m)) 
(Ne) or "2 - ere 2 


rt 


T 
(65) 

P : H,,(v;). 
m! 


Here H,(x) denotes the Hermite polynomial of degree r defined by the equation 


(66) (- £) e*" = H.(x)e*”. 
dx 


8.6. The asymptotic distribution of e.(x, x®). As N, and N; tend to infinity, 
the distribution of 2(Ny' + Nz*)*Ig(48)\few(x”, #°) — G(—46)] tends 
weakly to the normal distribution with zero mean and unit variance. 

Again it may be better to use the asymptotic distribution of u, together with 
equation (50). The limiting distribution of 2(Nj° + N32”) (um — $5) is normal 
with zero mean and unit variance. Hence we have 


(67) = Pr(ew(#™, 2”) < 2) © G(2Q(NT’ + Ne IG "(z) + 46). 


1) 


9. Case three: »’, »” and © unknown. In this case the discriminant function 
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as constructed from sample estimates of y”’, yw” and & is 
(68) D(x; x” x”. : §) = a ae x”)S"x’. 


. . ° . > . ° = a ) 
The classification procedure consists in assigning individuals to P“ or P® 
according as 


(69) D(x; x = (1) , x”: S) S pax” + x”); ga = (2), -S). 


Given x”, #® and S, the probability of misclassifying an individual from 


P™ is 1 — G(w,) where 
w, = (x 2 z°)s7"zs7(x” i x?) 7? 


(70) . 
é ((x” a x?)S- (afe? + ge) at wv” )’). 


We shall denote this probability by en(z%”, x; S). Clearly, en(#, #°; S) is 
a random variable. The exact distribution of ent” & x”: S) is comaplicated and 
we shall be content with giving its asymptotic distribution. To be slightly more 
general, we shall suppose that S is some estimate of & with n degrees of freedom 
and independent of x” and £”, not necessarily obtained exclusively from the 
same samples as those from which x” and x” were obtained. It can then be shown 
that as N, , N2 and n tend to infinity, the distribution of 2/Nyj*’ + Nz} fg(48))"° 
fe(x, #”; S) + G(46) — 1] tends (weakly) to the normal distribution with 
mean zero ond unit variance. We have also, corresponding to equations (46) 
and (67) of previous sections, for large values of N, , Ne and n, 


(71) Pr(ew(#™, ”; S) < 2) & G(2(N7? + Ne}1G7"(z) + 46). 


10. Expected probability of misclassification. 

10.1. Introduction. We have been discussing in previous sections sampling 
fluctuations in the chances of misclassification involved in using estimated dis- 
criminant functions. Distributions and expected values were the objects of in- 
vestigation. The expected values were evaluated in certain special cases using 
ad hoc methods. We shall, in this section, give a unified treatment.” The method 
of this section is capable of yielding expected values in all cases where the dis- 
persion matrix is known. 

Besides exact expressions some simple approximations also will be given. 

10.2. Exact expressions. Consider the case where only the variance-covariance 
matrix & is assumed to be known. We require expected values of e2(z, 2°) 
and ¢,(x°, x). We shall evaluate Ee»(z#, x) only, since Een(%, 2°) 
can be evaluated on similar lines. 


Using results contained in [10], it is possible to prove that 


(72) Ee:(z?, 2°) = Pr(vz'v, > [1 + pl fl — pl) 


5 Though the method of this section has the merit of greater generality, the expressions 
obtained in the earlier sections are somewhat simpler. 
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where v; and vp are independent non-central chisquare variables each having p 
degrees of freedom and non-centralities given respectively by 


(73) rs = [4(1 + p) J NNA(Ni + No)? — (Ni + No + 4NiN2) O78 
and 

(74) re = [4(1 — p) J NN2[(Ni + No)? + (Ny + No + 4N\N2) PS’. 
Here 

(75) p = [((N: + N2)(Ni + No + 4NiN2)] (Ne — Nj). 

Using the expression for the density function of vz'v, we may write 


Soi 0 © Tr(p+rts) Ai AS 


Eey(x, x”) = ¢ _ =e 7 
r=0 m0 1 (3p + r)l (3p + 8) ris! 


(76) 
. / (1 + uy) Prt fete ay 
(1—p)/(1+p) 


We shall put equation (76) in the slightly different form 


xo r 


WwW (2) ( +o) 4 4 
Eey(%”, 2) = > > tl — hapa p+7,3p + s)} 


r=0 s=0 7.8. 


3 


(77) 


NM 
+> ym —— aml p + 30+) |, 


r=( s=r+1 rT: 8s! 

The various terms on the right hand side in equation (77) can be evaluated using 
tables of I,(p, q) given in [16]. 

In case a also is known with =, we have only to take, in equation (77), 

4 = [4(1 + p)] NI — (1 + 4N2) #78", 
(78) igs +) —Hyeg2 

Ae = [4(1 - p)| N21 + (1 + 4N2) ye, 
and 
(79) p= —[1 + 4N,J° 
to get the expected value of e»(x®). If the known parameters are Z and y”, 
we take 
4 = [4(1 + p)] Nill — (1 + 4N,) Pe, 


(80) nn + \—hea 
deo = [4(1 = p)|} N,{] + (1 + 4N,) ‘se, 


and 
(81) p = (1+ 4N,)~. 
10.3. An approximation. In the previous section we obtained an exact expres- 
e . -(1 = (2) vew . e . 
sion for the expected value of e.(x*’, x’). That expression is not quite con- 
venient for numerical evaluation. For this reason we now give an approximation 
which permits evaluation of the expected value using only tables of G(z). 
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We start with the result 
(82) Ee(z™, °) = Pr(vz'n > [1 + pl [1 — 9). 


Now, 


Pr(vz'v, > [1 + ep} [1 — o]) = Pr(vz* ot > [1 + p} ‘fl — o)'), 


83 
(83) = Pr([1 + pl’ vi— [1 — plo} > 0). 


From [1] we know that if x”’ has a noncentral chisquare distribution with f degrees 
of freedom and noncentrality parameter equal to \, the variable (x”’/r)*, where 


(84) r=f + 2h, 


has approximately a normal distribution with expectation 1 — 2(1 + b)/9r 
and variance 2(1 + b)/9r. (Here b stands for 2[f + 2d]’A). Using this result 
and also equations (82) and (83) we may now write 


(85) Eey(x”, x°) ~ G(a) 
where 
a = (18) prri(1 + p)§(1 + bi) + (1 — p+ be)? 
(86) 2{rs'(1 — p)*(1 + be) — ri + »)*(1 + by)} 
+ 9ri(1 + p)}*? — 9fre(1 — »)}4, 


the quantities 7: , 72 , b; and b. being defined by the equations 
ro = p + 2X; (¢ = 1, 2), 


(87) - : 
b; = 2(p + 2d)”, (i = 1,2). 


The question of the closeness of the approximation (85) now arises. We should 
expect that the approximation involved is of the same order as that involved 
in assuming that the cube root of a noncentral chisquare variable has the normal 
distribution. Though it would be interesting to compare the approximate values 
with the corresponding exact values using numerical computations, we shall not 
embark on this venture at the present moment. Some numerical results given 
in [1] may be found enlightening. 


11. Estimation of the expected probabilities of misclassification. The expres- 
sions for the expected probabilities of misclassification are found to be functions 
of 6. The problem of estimating these expected probabilities now arises. The 
empirical method of estimating the probability of an event by computing the 
proportion of outcomes favourable to the event in a number of repetitions of 
the experiment is available to us. This method is suggested in [20]. If the problem 
is one of estimating the conditional probabilities of misclassification, the empirical 
method is a simple way of solving it. What we have to do is to use the estimated 
discriminant function to classify the N; individuals known to belong to P™ 
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and note down the proportion of these individuals assigned to P” (i = 1 and 
j = 2ori = 2andj = 1). 

If the problem is one of estimating the unconditional probabilities of mis- 
classification, the empirical method can prove exasperating. Fortunately, the 
maximum likelihood estimator is simple enough. To obtain the maximum likeli- 
hood estimate, all we have to do is to substitute (x? — y™)="(z® — yw)’, 
or (y” — £”)571(y® — £%) or (2? — 2%) E729 — 2) or 


(Ni +N: — 2)7(N, 4 N2)(x” BU x”) Sz” we x) 


for 8 in the expressions for the expected error (or approximations to them), 
depending on which parameters are known. 

The exact value of the expected probability is certain to differ from its estimate. 
An idea of the magnitude of the difference that may be expected can be obtained 
from the variaace of the estimator. But the expression for the variance happens 
to be quite cumbersome. Besides, it involves unknown parameters. Fortunately 
another approach is open to us. We indicate below how intervals which enclose 
the true value of the probability with any preassigned degree of certainty can 
be constructed. 

The procedure to be adopted is the following: Suppose the desired confidence 
level is a. Set up for 6 a confidence interval of confidence coefficient a. Suppose 
the upper and lower bounds are respectively 6, and 6, . Evaluate the expression 
for the expected probability of misclassification substituting in turn 6, and 6, 
for 6. The two values thus obtained will enclose the true value of that quantity 
with probability a. 

Confidence bounds for 6 can be set up if we remember that 
Nix” — yp”) =x” ve a y* N(x” ae a er 1g ae ar"), 

(Ny + No) "Ni? (2° — #)E"(#® — 2)’ 
have noncentral chisquare distributions with p degrees of freedom and noncen- 
trality parameters equal respectively to N,}6°, N2}8°, and (N, + N2)'NiN238 
and that [p(N; + N2)(Ni + Nz — 2))'NiN2(Ni + No — p — 1)(8#” — 2”) 
s7(z® — x)’ has the noncentral F-distribution with degrees of freedom p 
and N, + Nz — p — 1 and noncentrality equal to (N; + N-) 'NN2}8". 

For the sake of definiteness, let us suppose that we are dealing with a situation 
where = is known and y” and y” are unknown. Let v be a noncentral chisquare 
variable with p degrees of freedom and noncentrality equal to (N, + N2)™ 
N,N;46°. Set 
(88) F(z) = Pr (v < 2). 


As 6, we can take® the least upper bound of the set of all numbers 6 satisfying the 


* Marakathavalli [12] should also be consulted. She discusses how unbiased critical 
regions can be set up for testing hypotheses specifying the value of the noncentrality 
parameter. Inversion will give an unbiased confidence interval. Methods of approximately 
evaluating the probability integral of the noncentral chisquare density developed in [1] 
and [13] will be required. Tables given in [6] and [12] will be found useful. 
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condition 
(89) F3((Ni + N.]'NiN2(2? — 2#™)271(2? — 2#)’) = 4(1+ a) 


and as 6. we can take the greatest lower bound of the set of all numbers 6 satis- 
fying the condition 


(90) Fs((N, + No] 'NiN2(2® — 2) 27 (2 1(1—a). 


12. About a more general classification procedure.’ There are situations where 
the two kinds of errors are not of equal importance. In some cases it may even 
be possible to determine the different losses consequent on each type of mistake. 
Suppose cy is the loss incurred in assigning an individual from P“ to P® and 
Cx is the loss incurred in assigning an individual from P® to P“ . Suppose further 
that the a priori probability of an individual coming from P“™ is x“. Then the 
procedure with minimum expected loss is that of assigning an individual with 
measurements x to P™ or P® according as 


2) (1) 


1 


(91) (97 — g™)z'e S Xe” — gE 


(y” + Pr 4 c 
where 


(1) 
wT Ci 

(92) c = log. —— 
1 C21 


({3], p. 134). The procedure we considered earlier is a special case of this more 
general procedure. It corresponds to the case c = 0. A sufficient condition for 


° (1) (2) 
c to be zero is that cy = c, andar” = 


The procedure mentioned above can be carried out only if all the parameters 
(1) 2 ‘7 . . ° . ° -» 
uw, uw and & are known. If such is not the case we may assign the individual 


1 (2 . 
to P™’ or P™ according as 
(2 


(93) (x2 — x”)S IZ! S 1(¢ as x" Ss 1g?) + x) + ¢. 


) 


The sampling fluctuations of e2(2™, 2°, S) and ex(x”, x”, S), the conditional 
probabilities of the two kinds of errors, are again of interest. We shall briefly 
discuss the case of = known and give indications of the changes to be made in 
some of the earlier formulae, leaving a fuller discussion of the procedure to 
another occasion. 

The distribution of e.(x"’, x”) is given by the equation 


(94) Pr (e2(2, 2”) <2) = Pr(m > —G@'(z)) 


where 


=(1)\ 774 
E — oy] 
(95) 
' = (1) \ 4p 7-52) = (1) \ g—1 (1) (1) 
— #2") (27 — x (x — yp)’ +c). 
7 The results of earlier sections can be generalized in other ways, which we hope to 
indicate in a later communication. 
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This equation shows that it suffices to obtain the distribution of u,; . The density 
function h(w) of w is 


Ni +N! __ Rica me 1 M—N24 
f[PE) ex | - Fon tw {uF M+ Vo," 


Not 
- (pM. + Nz + c) ay | 0,0) dt; dt, 


(96) 


where f(t; , 4) is to be taken from equation (56). 


If N, , Nz and n are large, we have, corresponding to equation (71), 
Pr (en(#, #7; S) < z) & G(28[(e + 2c)?Ni° 
(97) ‘ ss a - ts 
+ (& — 2c)?Nz' + 2(2cd)*n"|"{G"*(z) + cd? + 46)). 


> ° ° ° ° : r . (i) ; 
If more information is available, the results become simpler. Thus if uw” is 


also known with &, the distribution function of e2(x ) is given by the formula 
(Pr (v < A,(z)) + Pr A2(z)) 
Pr (ep(#”) <2) =¢ ife<1— sen (c = 0) 
re ifz => 1 — G({2c}'), 


where v is a random variable having the noncentral chisquare distribution with 
p degrees of freedom and noncentrality N26 and 


(99) Ai(z) = NG *(z) + {[E"(z)P — 2c)'P 

and 

(100) A(z) = NAG "(z) — {[E"(z)P — 2ch'P. 

Similarly we have, for the distribution function of e,(%°), the equation 
(101) Pr (e(%”) < z) = Pr(w < G‘(z)) 


where the random variable w has the density function h(w) defined below. 


(w) (w) 
' ™ nL +[- it —é (w— hu 
(w) m3; (w) 


) —3 1 —Nouw e ‘ ; 
— cu Payee “2” du if w = 6 + (2c) 


[1-5 “(w-—4u-—cu 


m,(w) 


(102) h(w) = Pe 2c+$2) _ 2 “2y2ye-®) (c 2 0) 


weer dus if(2c)'} -ssw< < (2c)'+6 


ifw< (2c)? — 6 
Here 





ERRORS IN DISCRIMINATION 


m(w) = w+ 6 — [(w + 8)? — 2c}, 
m,(w) = w — 6 — [(w — 8)* — 2c}, 
ms;(w) w—6+ [(w — 6)? — Qc}, 
m,4(w) w+6+ [(w + 6)? — Qc}, 


and C is the C of equation (42). 
Observe that with probability one, 


(104) en(%”) < 1 — G([2c}') 

and 

(105) en(x™) = G([2e]' — 4) (c2 
If c < 0, we have, for the distribution function of e2(%”), the equation 

(106) Pr (e(%”) < 2) = Pr(v > A2(z)) 


instead of equation (98) and, for the density function of w, the equation 


m,(w) 


h(w) = Ce** mre (1 —s°(w—3u 
m3 (w) 


(107) 


- ay syie-o p-1,—Nauw ] 


— cu U du 


instead of equation (102). 
Note that statements corresponding 10 equations (104) and (105) cannot be 


made if ¢ < 0. 
For Ee.(x°’, x) we have the following result: 


(108) Eew (x, x”) = | fil 
d 


where fi(z) is the function defined by equation (4.9) of [10] and 
(109) d = 4N\N2(N, + Nz) *(Ni + No + 4NiN2)*c. 
As the expression for f;(z) is complicated we shall not reproduce it here. 
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ON THE MONOTONIC CHARACTER OF THE POWER 
FUNCTIONS OF TWO MULTIVARIATE TESTS! 


By S. N. Roy anp W. F. Mixkuwarn 
University of North Carolina 


1. Summary and introduction. The largest characteristic root has been pro- 
posed in [2] as a test statistic in (i) the multivariate analysis of variance test, 
and (ii) testing that two sets of variates are independent. In this paper it is 
shown that, in each case, the power function is a monotonically increasing func- 
tion of each non-centrality parameter, separately. This property was stated in 
[2] without proof. This provides a stronger result than would be obtained by any 
direct use of Anderson’s Theorem [1] which implies that the power function 
increases when all the roots are simultaneously increased in the same ratio. The 
proof of the monotonocity property for the multivariate analysis of variance is 
given in Section 3, and in Section 4 it is shown how the proof is modified for 
testing independence between two sets of variates.” 


2. Preliminaries for the multivariate analysis of variance situation. Let u, s 
and n denote respectively the “effective’’ number of variates, the degrees of 
freedom of the hypothesis and the degrees of freedom for the error and let 
t = min (u, s). Then, with X = [z,;|:u XK s and Y = [y;;|:u X n, the canoni- 
cal form for the d.f. in the multivariate analysis of variance model was ob- 
tained in [2, p. 86] to be 


(1/(2e)""™} 


(2.1) tk 9) Re + Blow @;)° + > we? Ty 


i=l i=t+l] i=] j#i=1 


T dx; WT A dys; = Q dX dY, 
=] j=] i=] 
and the acceptance region of size 1 — a for the linear hypothesis H, of analysis 
of variance, i.e., for the case 6; = 6. = --- = @, = 0, can be expressed as 


(2.2) ® = {X, Y:eu[(XX’)(YY’)] S uy}, 
where yu is given by 
(2.3) P[X, Ye D| 6s = 0] = 1 — a, 


and where the 6;’s are the non-centrality parameters defined in [2, pp. 85-86], 
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and cy(A) denotes the largest characteristic root of the (square) matrix A. 
The problem now is to prove that the integral, P[X, Ye D] =fpQdXdY isa 
monotonically decreasing function of each 6; , separately. If we regard the do- 
main D as one of dimensionality us + un, where us dimensions are associated 
with X and un with Y, then it is clear that we can rewrite this integral as 


l const. exp | -} (> a vis + be a “,) | dX dY 
> 


a“ i=l j=1 i=l j=l 


(2.4) 
= [ Q* dX dY, (say), 
= 
where D* is merely the domain ®D translated by 6, along z,,; , that is, along the 
ith axis (with 7 = 1, 2, --- , t). Notice that if, in the integral (2.4), we replace 
the domain * by 9, the integral over the new domain becomes equal to 1 — a, 
where a is the probability of the first kind of error. Let 


(2.5) YY = (VV), 


where V is a u X wu triangular matrix with zeros above the main diagonal. Ob- 
serve that 


cyl(XX’)(YY’)"] = ew[(XX’)(V’V)] = cul(VX)( VX)’, 


and rewrite (2.2), that is, the domain D as 
(2.6) ® = {Y, X:ey[VX(VX)’] S yl}. 


Notice that V is a function of Y given by (2.5). 

The problem now can be rephrased in the following way. How does the in- 
tegral of Q* given by (2.4) over the domain D given by (2.6) change under 
successive translations of @; along 2, , of 62 along x%2,--- , 6 along x, ? It is 
clear that the successive changes are cumulative. It will be also seen from the 
mechanics of the demonstration that if we can prove that the integral decreases 
for the first shift of ©, namely, by 4, along xz , then the general theorem itself 
will be proved. 


3. Proof of the monotonicity property for the multivariate analysis of variance 
situation. The proof is developed in three main steps discussed in the following 
subsections. 

3.1 The proof for the univariate case. In this case, u = 1 and we can drop the 
first subscript in X, Y. The domain D of (2.6) now takes on the form 


(3.1.1) D = yx Y¥:> 2 <u> y3 


] j=l — jul { P 


and the integral of Q over this domain takes the final form 


(3.1.2) [. exp Ee (> yi + > “) | II dy; I dx;. 


4 \j=1 j=l j= 
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Notice that now * is just a shift of D along 2, by @. It is evident from the 
form of (3.1.2) that the integral (3.1.2) decreases under this shift if, for any 
given set of y;’s for7 = 1, 2, --- , n, and z,’s, forj = 2,3,---,8, 


a+6 a 
(3.1.3) | exp | -3 ai] dz, < / exp | -3 a] dx, 
—a+6 2 a? 2 


where a = +[u > Ruiyi — doje zi]; it is clear that it doesn’t matter whether 
we take @ to be positive or negative. It is easy to verify this and also an even 
more general result, namely that 


a+h a 
(3.1.4) | o(z) dx s | (x) dz, 

—a+h —a 
for all real \ and a > O, where ¢(z) is a continuous function of x, symmetric 
about 0 and monotonically decreasing with | x |. It is also clear that the left 
side of (3.1.3) monotonically decreases with | 6 | 

3.2. The nature of the multivariate domain (26). We now characterize D as a 

domain in (2 , %1, °** , Zu) for fixed values of V, wand X (exciuding the first 
column). Toward this end, put X* = VX and observe that, if » is any charac- 
teristic root of X*X*” = §S*, then 


(3.2.1) |S* — | = 0 =|S*| — otr,, + “trp. — --- — +(—1)" 


where tr; is the trace of the jth order, or in other words, the sum of the jth rowed 
Sta minor determinants of S*. But, given x7; (fori = 1,2, ---,u;j = 
2,---,8),| S*| =| X*X*’ | is a homogeneous quadratic function of (xii, -- + , 221) 
+a scien which is really a function of the other 2z7,’s just mentioned. The 
coefficients of the quadratic — are also each a polynomial function of 
a? ,’s (fori = 1, 2,--- , u;j = 2,3, --- , 8). Likewise, if we take any g-rowed 
principal minor determinant of is, say the one with rows and columns num- 
bered, 1, 2,--- , g, then that determinant is 
* 


oe * * 
Mir °° Me || Ti +e * . Tq |] 


| 
* * * * || 
| Xq1 “*- Xs Ty 8 “*e Las | 


which, given the other 7,’s, isa homogeneous quadratic function of (xi; , --- , 2e1) 
(in which the coefficients are polynomials in the other z7;’s) + a constant 
which is really a function of the other z7,’s. Thus, given v and the other ri,'s 8, 
the equation (3.2.1) in v yields a homogeneous quadric surface in 21), «-* , a1. 
Now recall from (2.5) that, given y;,’s, that is V, the (xj, , --+ , va) are linear 
functions of (au, --*, a) and likewise (xj7;,---, xu;) are linear functions 
of (%1;,°** , uj), for 7 = 2,---, s. Thus, given » and 


2i;(fori = 1,2,---,u;j = 2,---, 8), 


the equation (3.2.1) yields a homogeneous quadric surface in (1, --+ , Za) in 
which the coefficients and the constant term are all functions of v, Y and the 
other x;,;’s already referred to. This is for any characteristic root v. 





1148 S. N. ROY AND W. F. MIKHAIL 


Let us now rewrite (2.2) in the (almost everywhere) equivalent form 


D= 1X, Y: sup, 


(tu + Az X21 + =r Ayu) + oe + (21. + Az X25 + » Heese AyLus) 


(yar + G2 yor + ++ Qu Yur)? + ++ + (Yin + G2Yon + °° * Gu Yun)? 


where a’ = (a@,--- , a). Now, given yw, Y and z;,’s (fori = 1,---,u;j = 
2,---,8), (3.2.2) represents the domain of (ay, --- , 2.) in an u-dimensional 
Euclidean space, the boundary being given by the surface defined by the equality 
sign. An equivalent form of the same surface is the homogeneous quadric asso- 
ciated with (3.2.1) after v is replaced by wu. Next, it is easy to check from the 
definition of D and the manner in which the vector (xy , --- , 2.) occurs in it 
that (3.2.2) implies the following: 

(i) (X, Y) ¢ D implies that ((cx,, X.), Y) ¢ D for 0 S ¢ S 1, where X is 
decomposed into (x; , X2) such that x; = (an,---, tu) and X, is a matrix 
with the other elements of X. 

(ii) Any straight line passing through the origin (0, --- , 0) has an inter- 
section with the domain, of finite length. 

Thus, given uw, Y and the other 2z;;’s (already described), (2.2) or (2.6) can 
be regarded as a domain of (2, ---: , 2.1) which is the interior of a u-dimen- 
sional ellipsoid whose boundary is given by (3.2.1) after u is substituted for v. 
It is well known that there is an orthogonal transformation by which the el- 
lipsoid can be referred to principal axes, or in other words, the transformed 
equation to the surface becomes free from the product terms in the transformed 
variables and involves only the square terms with positive coefficients. Let 
x; = [tn,-:: , Zu] and 


(3.2.3) a= iz, , 


(3.2.2 


=r, 


where L:u X u is an orthogonal matrix that transforms the ellipsoid into prin- 
cipal axes. This L can be determined and the rows of L, say (lia, --+ , liu), 7 = 
1, 2,---, u, are the direction cosines of the different principal axes. Note that 
z'z = x}x, . It would be useful to rewrite (2.4), after substitution of ® for D* 
and omission of the constant, in the form 


(3.2.4) [ exp | -3 > > vii + YF >, tii + at dY II II dz, II dz; , 
/D 4 (i=l j= =1 j=2 i=l i=l j=2 i=l 

where, given uw, Y and the 2;,;’s, the domain D, as a domain in (2, --- , 2u), 
forms the interior of an ellipsoid referred to principal axes (that is, in a form 
which is free from the product terms of z’s and involves only the square terms 
with positive coefficients). In other words, D is symmetric about the origin in 
each z; separately. A displacement 6, along the direction of x, might be regarded 
as the resultant of a displacement 1,6, along z, , that is, along the direction with 
cosines (ly , lo, --: , liu), a displacement /.:6; along z2 , that is, along the direc- 
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tion with cosines (la , le, --- , leu), and so on, and finally a displacement 1,,4, 
along z, , that is, along the direction with cosines (li, ---+ , liu). It should be 
remembered that these l;;’s are functions of 4, Y and the z;;’s of (3.2.4). 

3.3 The final step in the proof of the monotonicity property. Looking at (3.2.4) 
and using (3.1.4) we observe that a displacement of D by J, along 2, will de- 
crease the integral under (3.2.4), because, for any given set yw, Y, 2;,;’s and 


22523, °°" 5 Zuy 
a+1; 16; 1 1 
| exp | - ai] de < [ exp | -} ai | da, 
—a+1, 19, 2° 


where a and /,,4, , without any loss of generality, can be assumed to be positive. 
Recall that a is a function of yw, Y, z;;’s and z,--- , z, . Using the same argu- 
ment for successive displacements by 1/26; along 22 , by 13:6; along z; , and so on, 
and finally by 1.6; along z, we have successive decreases of the integral. In 
other words, the resultant displacement which is along 2 and by 6, decreases 
the integral. At this point we go back to the integral over D of Q*, forget about 
the z,’s, use the result just stated about a displacement by 4, along zy , apply 
successive displacements by 42 along x22 , 6; along x33 and so on, and finally 4, 
along x, and eventually obtain an integral over the displaced domain D* which 
is less than the one over the original domain ®. It is also clear from the mechanics 
of the proof that the integral over D* decreases as each | 6;|,7 = 1,2,---,t, 
increases separately. This proves the monotonicity property. 


4. The case of the test for independence between two sets of variates. With 
a (p + q) set (p S q) of variables let us assume, for a sample of size n + 1 
(>p + Q), the canonical distribution law ([2], p. 68) 


Pp 
| 1/(2e)™ II (1 ca a 
i=l 


P 


(4.1) . exp | - 9 yur es a (Hs -r Yis a 20; Xi; Yj) + > > vit | 


‘ i=p+l j=l 


TET ae FLT as, 


t=] j=l = 


where p,’s are the population canonical correlation coefficients. The hypothesis 
of independence H, is equivalent to the hypothesis that p;’s = 0; the acceptance 
region (of size 1 — a) for H, is 


(4.2) © = {X, Y:cul[(XX’) “(XY’)(YY’) ‘(YX’)] < 
where u is given by 
P[X, Ye D|H,|] = 1—a. 


The monotonicity in this case is proved in exactly the same way as in the 
previous case. For this purpose we rewrite the D of (4.2) as 
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(4.3) “o )%, ¥: ew{(UU’)(VV’)*} <; u 


and the d-f. of (4.1) as 


const. exp | -: 


(4.4) 


where 


(4.5) rs 

(for j = ee he : 2,:-:,p), andy; = 0, 
otherwise, and where T:¢ X g, U:p X gq, and V:p X (n — q) are related to 
(X:p X n, Y:q X n) in the following way: 


(4.6) Y = TL, 


where L:q X n is orthonormal and T is lower triangular. M:(n — q) X nis an 
orthogonal completion of L, D,, ; p X p stands for a diagonal matrix with 
diagonal elements a; , @2,--- , a), and U and V are given by 


(4.7) U=Da.» XL’,  V = Da_,) XM’. 


In the transformation from (4.1) to (4.4), M does not occur explicitly, L does, 
but is easily integrated out as in [2, pp. 196-197]. 

The probability of the second kind of error is given by integrating (4.4) over 
the domain (4.3). It is easy to see that, aside from the positive constant factor, 
this is equivalent to 


q i p pon q 
4s) [exp E ( Se+Fte2+t F 3) | au av [I tt‘ at 
p* 2 \i=1 j=l i=l j=l i=l j=gq+l i=l 
where, for any given set of T and V, * is just D displaced by @,t along un , by 
Goto; along U2; and Moto along U2 , and so on, and finally by @,t» along wu»: , @ptp2 along 
Upe, *** » Optpp along uy, . Notice that when H, is true, that is, when 6; = 0, we 
should have * replaced by ®D in the integral (4.8). Using the same kind of 
argument as in Section 3 it follows that, for any given T, the partial integral 
over U and V decreases as D is displaced by @t:: along uy, , where t,,; > 0, almost 
everywhere, and with this displacement of 9, it is easy to see that the total in- 
tegral (if we now integrate over T) will also decrease. From considerations of 
symmetry, the same result would follow for the other displacements in that the dis- 
placement associated with any 6; could be represented as a 6,t;; along wy, under a 
suitable transformation. Thus (4.15) monotonically decreases as each |6,|, that 
is, each |p;| separately increases. 


Concluding remarks. The power functions of the A-criteria for the multi- 
variate linear hypothesis and for the test of independence between two sets of 
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variates have also somewhat similar monotonicity properties that will be dis- 
cussed in a subsequent paper. 


REFERENCES 

T. W. Anperson, “‘The integral of a symmetrical unimodal function,’’ Proc. Amer. 

Math. Soc., Vol. 6 (1955), pp. 170-176. 
] S. N. Roy, Some Aspects of Multivariate Analysis, John Wiley and Sons, New York, 1957. 

S. N. Roy anp R. GNANADESIKAN, ‘‘Further contributions to multivariate confidence 
bounds,’’ Biometrika, Vol. 44 (1957), pp. 339-410. 

S. N. Roy anp R. GNANADESIKAN, “Some contributions to ANOVA in one or more 
dimensions—Parts I and II,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 304-340. 





THE MOMENTS OF ELEMENTARY SYMMETRIC FUNCTIONS 
OF THE ROOTS OF A MATRIX IN MULTIVARIATE 
ANALYSIS! 


By Triro A. MIJARES 
Harvard University and University of the Philippines? 


1. Introduction and summary. Pillai and Mijares [7] gave the exact expressions 
for the first four moments of the sum of s non-zero roots of a matrix occurring 
in multivariate normal analysis as studied independently by R. A. Fisher [3], 
P. L. Hsu [4] and 8. N. Roy [9]. In this paper some properties of completely 
homogeneous symmetric functions and certain determinantal results (Section 2) 
are used to give an inverse derivation of those moments (Section 4). The method 
is further extended to the moments in general of elementary symmetric functions 
(e.s.f.) of the roots of a matrix in multivariate analysis (Section 6) through the 
use of certain properties of compound matrices (Section 5). 


2. Some results to be used in Sections 4 and 6. Define the completely homo- 
geneous symmetric function (c.h.s.f.) of the pth degree in k arguments by 


2.1) op(%1,°°°, te) = > x?! pts +> eee 
Pop) 


where >. extends over all partitions P;,) of a non-negative integer p = )-'-1 p,. 
Define further @ = 1 and ¢, = Oif p’ < 0. 
LEMMA l. 


bp(X1, °°* » Xe) = hpl(X1, +°* 5 Lin, Vinny, +++ Xe) + Libpal(ti, ++ , Le). 

Proor. Partition ¢,(21, --- , 2) into two groups, one group to contain 2; 
and the other group not to contain x; . Factor out x; from the first group and 
take the sum of the two groups. 

LEMMA 2. 


(Xi4j; ie Li )\bp-( 11 g *** 9 ag 
= $y(t1, °° 


lis °° Bbq 8 * yp Bingen s Laveen °** yBe)- 


Proor. Use Lemma 1 separately for the z;,; and 2; arguments on 


bp(21, °° , te), 


take the difference of the two resulting equations, and transpose the term 
(Li45 — Lilbprlti, --+ , te) to the left of the equality sign. 
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THEOREM. Let r; # r2 # --- Ary, be non-negative powers of the x’s in the suc- 
cessive columns of the k-order determinant given below and let $; be the c.h.s.f. 
in all k arguments. Then 


laii ssa | = D |b,,-040 |, £j=1,--:,k, 
where 
(2.3) D = |zirias 


and x42 i415 br;—k+i 5 ati—241 are the (i, j)th elements of the square matrices (xj/:41), 
(Gr ;—-4+4), (ap 41), respectively. 

Proor. Perform the following elementary operations on the left determinant 
of (2.2): ith row — kth row, i = 1, --- , k — 1. Use Lemma 2 for two argu- 
ments to factor out 2;%~;4; — x; from the ith row. The result is 


k 

IT (25 — 41) + |\br;’-440°(01, Te-vgs)|, 0,9’ = 1,°-- ,k, 

j=? 
where ¢,;/4+i7(21, Te-i’41) 18 the (1’, 7’)th element of br ;'—e4i' (Tr , Tear 41)|. 
Thus, the determinantal expression in (2.4) has c.h.s.f. in two arguments for 
its elements, except for the elements in the last row which have 2; only as argu- 
ment. 

Next, repeat the operation on the determinant of (2.4) with ith row — 
(k — 1)th row, i = 1, --- , & — 2 and use Lemma 2 for three arguments this 
time in factoring out 2,~;4: — 22 from the ith row as the resulting determinant. 
Repeat the operation until finally we have lst row — 2nd row and the same 
lemma is used but for k arguments. After factoring out 2, — 2;-, from the last 
determinant, the expression (2.4) reduces finally to 


(2.5) I] EF 25) + \br jeer (2 as PRG Xe; 41)], 


i>j 


with the z’th row of elements ¢,;'.+4; containing arguments 2 , T2, -** , Te-w4i . 

The product [],5; (2; — x;) is equal to a determinant of Vandermonde and 
given directly by (2.3). The determinantal part of (2.5) can be reduced into a 
determinant in the ¢’s, with complete arguments 2; , --- , 2% , by the repeated 
application of Lemma 1. Take any element at the intersection of the 7th row, 
i = 2,--- ,k, and a given jth column of the determinant in (2.5), and note that 


: Gr ;—h+i( Tr, °° * » Le—i4t) + Le-izebr;—e+ia(Ti, °°", Te-i42) 
(2.6) 
— bre +i( Lr po. Le-i42) 


after using Lemma 1 for k — i + 2 arguments. Hence, perform ith row + 
2—142°(t — 1)th row successively for i = k, k — 1, --- , 2 in the order given 
and use Lemma 1 on k — 7 + 2 arguments. This increases by one the number of 
arguments of the ¢’s in each row. Now, perform ith row + 24~:43:(¢ — 1)th row 
fori = k,k — 1, --- ,3 and use Lemma 1 on k — 7 + 3 arguments, etc., until 





1154 TITO A. MIJARES 


finally we have (k — 1)th row + 2,-kth row with Lemma 1 used for * arguments. 
This completes all the arguments in every ¢ of the determinant in (2.5). 


3. The mathematical expectation of the sum of the roots. The well-known 
distribution of the s non-zero roots obtained independently by R. A. Fisher 
[3], P. L. Hsu [4] and S. N. Roy [9] can be written [7] in the form 


f(0., ---,0,) = e(s,m,n) [] #71 — 0,)" [J (6; — 4;), 
i=] 


>) 


(3.1) 
0<65 -™ —-. <3, 


where m and n have various interpretations which depend on the null-hypothesis 
[6] and 


f1(9 


. 4(2 2 s+i+2)} 
(3.2) c(s,m,n) = n II i7 ry 7 = -# : ak am 

iat FP {3(Qm +72 +1)}T{$(Qn +24 1)}T() 
Denote the sum of the roots >>! 6; by V{", the subscript 1 indicating the first 
e.s.f. (Pillai uses the V“ notation.) Then the mathematical expectation of 
exp(tV) can easily be shown to take the determinantal form 


E(e"\”) = [. -f $001, +++ ,0,)e"8” II ao; 


6, 


1 
c(s, m, n) [ ao, | d0,-1 vf 
Jo 0 0 


62 
dé, 


m -- n tO, ¢4 oe 
7 Op i4i (1 — 6,~i41) e 7” ; tj = Ri oom 


2°) 
where the (7, 7)th element of the determinantal expression is given by 
arte (1 — 0.-:41)"e*-**!, by first noting that the product [],>,;(@; — 0;) of 
f(0:, +--+ , 9.) is a Vandermonde determinant like (2.3) and then multiplying 
the ith row of this determinant by 67-;4:(1 — 0,-:4:)"e*-**", ¢ = 1, ---, 8. 

Now, denote (3.3) by U(m + s — 1, --:,m; n; t) and more con- 
veniently by U(s — 1, s — 2, --- , 0) with the m’s and n omitted when t = 0. 
To obtain in general the moments of the sum of the roots in determinantal form, 
U(m + s — 1, --: , m; n; t) is differentiated [5, 8] successively with respect to 
t and ¢ is set equal to zero after each differentiation. The lower-order moments 
through the fourth moment are given by equations (3.2) through (3.5) of [8]. 
For the purpose of this paper, the third moment is reproduced below with V“ 
changed to V;", 


E{(V{")*| = c(s, m, n)[(U(s + 2, s — 2,8 — 3, ---,1,0) 
+ 2U(s+1,s—1,s— 3,---,1,0) 
+ U(s,s—1,s —2,s—4,---,1,0)]. 


4. An alternative derivation of the moments of Vi". We indicate in this sec- 
tion an alternative way to derive the moments of V;"’, and we extend this method 
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later in Section 6 to obtain the moments of V;"’, the jth e. s. f. of s roots, which 
are not yet available in the current literature except for 7 = 1 and s. 
Consider the classes of functions of 6; , --- , 6, of form 


(4.1) U'(de,%-415,°°°,%35t) = Ot zj tt ete—s+1 Seo >a S mM, 


where 7,7 = 1, --- , 8, and are respectively the indices of the rows and columns 
of the determinantal expression. Denote those of form (4.1) for t = 0 and 
q; = qi — msimply by U'(q., gia, -*: , 91). Then U(m+ 8 — 1,m+ 8 — 2, 
-++ ,m;n;t) of Section 3, i.e., (3.3), may be rewritten as 


1 


6, Bans 6s 
c(s, m,n) [ dé, [ dée,_1 | tee [ dé, 
Jo 0 0 0 


-U'(m +8 —1,-+--,m;t)- II (1 — 6)”. 


i=l] 


(4.2) 


The class U(q., Gs-1, °°: , 913; ¢t) of functions of 6, --- , 0, generated by the 
successive differentiations of U(m + s — 1, m+ 8s — 2,---,m;n; t) with 
respect to ¢ can be represented by the class U’(q, , ds-1, «-* , 9 3 t) of functions 
generated by successive differentiations of U’(m + s — 1, --+ ,m;t). 


Since U’(m + s — 1, --- , m; t) may be verified to be equal to 


8 


IJ ore’: [JT (6; — 0;) 


i=] i>) 
by comparing (3.3) and (4.2), 


E\(V$?)) = d'/dt {E(e")’)}\ 20 
= c(s, m,n) / Ts {ul or (Vi")’ I] (@ — a} {T]G —9;)"de;}, 
(i=l i>j i 


where the right side of the first equality indicates evaluation of the rth deriva- 
tive at t = 0. Obviously, the factor (V;" )” Ils; (6; — 6;) isa linear combination 
of functions in the class U’(q, , gs-1, °:: , 41) and so moments of Vj” may be 
derived alternatively by finding the necessary linear combination in this class 
with the aid of the theorem in Section 2 applied in the reverse manner. 

To illustrate now the alternative method of obtaining the moments of V}{", 
take the case of the third moment given by (3.4). Let ®, be the equivalent 
c.h.s. f. of @, in Section 2 when the arguments in 2’s are replaced by arguments in 
6’s. The initial choice of the s-order determinant in the class U’(qi , gi-1, *** 5 91) 
is suggested by 4} which is equivalent to (V{")*. Choose the U’-determinant 
such that we have the elements 4, , #; , ®; , ®o, --- , &o along the principal 
diagonal. The product of these diagonal elements is #} since @) = 1 by definition. 
We have 
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0 


= $} + 6, — 26, &. 


It may be remarked that if the determinant of (4.4) is multiplied by 
[].+;(@; — @;) and the theorem of Section 2 is applied, the determinant reduces 
to U’(s,s —1,s —2,s—4,---,1,0). 

We next wish to eliminate the product 4, in the right-hand side of (4.4). 
This suggests taking the s-order determinant in the U’-class with elements in 
the principal diagonal given by #2 , ®; , By , --- , Bo . Thus 


d, can, 
®; =e 0 


®, 41 ®,_; 


By the same principle as (4.4) above, the left determinant of (4.5) can be re- 
duced to U’(s + 1,s — 1,s — 3, --- , 1,0). Finally, after inspecting right sides 
of (4.4) and (4.5), we need 


®; 


which gives U’(s + 2,8 — 2,8 — 3, --- , 1, 0). Combining properly the right- 
hand sides of (4.4) through (4.6), we see the equivalence 


(4.7) &| = (&} + 6, — 26%.) + 2(6,o, — &;) + 4. 


Now multiply (4.4) through (4.6) by []Ti., 67(1 — 6:)" [].; (6; — 0;) and 
integrate over proper limits. We have 


E(&,)* = (8, m,n) [  [ {U%s,8 - l,s —2,s —4,---,1,0) 


+ 2U'(s+1,s —1,s — 3,---,1,0) 
+ U'(s + 2,8 —2,s — 3,---,1,0)} {]] oF(1 — 0;)” da;}, 
which reduces to (3.4) after noting how the U-class there and the U’-class here 


have been defined. It may be remarked that the linear combination in the U’- 
class is really (V}”)*]].>; (8; — ;) by comparison with (4.3). 
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5. The kth compound of a matrix. In order to extend our results to the mo- 
ments of J 
trices. 

Consider the expansion of [[]‘-: (1 — 2;t)]” into a power series 


(5.1) TT a — 2t)|" =1+et+e6+--- +o +--- 


r(s) 


j »j = 2, ++: , 8, we need an important property of compound ma- 


where ¢; is a c.h.s.f. in k arguments. Let a; = 7 a +++ x; be the jthe.s.f. with k 
arguments in x’s. Multiplying bothsides of (5.1) by II. (1—2t) = 2 at (—1)’ 
a;t’ and equating coefficients, we have 


om Gy 
2 — $14; + a 
3 — G20; + gid — a; 


Dr-1 
where a = ¢ = 1, then it may be checked that an alternative form of (5.2) is 
(5.3) (a)(@) = (¢)(a) =I 


where I is a unit matrix of order (k + 1). 

Now, consider the kth compound of a matrix (b), denoted by (b)™, defined 
by a matrix whose elements are k-order minors of det(b) arranged in Aitken’s 
lexical sense (see [1], p. 90), i.e., minors which come from the same group of k 
rows from (b) are placed in the same rows in (b)™, the order being decided by 
the columns of (b) that the minors contain in the same manner that words are 
arranged in a dictionary. For instance, the minor containing columns 1, 3, 4 of 
(b) is preceded in the row of (b)® by those minors containing columns 1, 2, 
q for q 2 3. The same definition holds if the words row, rows are replaced by 
words column, columns, respectively, and vice versa. 

Further, define the kth adjugate compound of (b), denoted by adj“’(b), as the 
transpose of the matrix formed from (b)“ after replacing every element in 
(b) by its cofactor in |b}. 

It may be noted that from the way (b)“ and adj“’(b) are defined above, 
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(b)™ = (b) and adj”’(b) = adj(b). Furthermore, every element in the product 
(b) adj“ (b) is a Laplace Development according to k-ordered minors and their 
cofactors in det(b). It may be checked easily that only the diagonal elements in 
the product (b)“ adj“ (b) are expansions in minors by their algebraic comple- 
ments (for definition, see [2], p. 23) and each is equal to det(b). The off-diagonal 
elements are sums of products of minors by the algebraic complements of some 
other minors and each sum is therefore equal to zero. Hence, if (b) is of order n, 


(5.4) (b) adj“ (b) = |b| I, 
where I is a unit matrix of order n!/{k!(n — k)!}. 
Consider now the product (a)(¢) in (5.3). On applying the Binet-Cauchy 


theorem (see [1], p. 93) and multiplying both sides of the equation by (adj(a)), 
we have 


(5.5) (o)(a) ” (adj(a) \ = (adj(a))“ I. 


It may be recalled that (a) adj(a@) = \a\I and by the Binet-Cauchy theorem, the 
kth compound is 


(5.6) (a) (adj(a)) = Ja|"T. 


Furthermore, comparing (op. cit., p. 98) the equality in (5.6) with (5.4) after 
replacing (b) by (a), we have 


(5.7) (adj(a))“ = ja\*"'adj (a). 


Using (5.6) and (5.7) and noting that ja} = 1, (5.5) reduces finally to 
(¢) = (adj(a)) 
(5.8) 
= adj”’(a). 
From the nature of the construction of (¢)“ and adj“’(a), the last equality of 
(5.8) reveals an inner relationship of minors of \¢| and |a| which plays a key role 
in the next section. 

If the columns of elements of (g)*’. which are k’-order minors of (k + 1)- 
order determinant |¢|, are labelled by their highest suffixes occurring in the 
columns and if the same method of labelling is used for the elements of adj* (a ) 
which are (k + 1 — k’)-order minors of \a|, then the two sets of suffixes form a 
bicomplementary set with respect to the highest index. Specifically, we restrict 
our use of (5.8) only to those minors with consecutive suffixes in the columns. 

For example, if an element of (¢) has column indices labelled 4, 2, 1 then 
the indices missing in the sequence of numbers 4, 3, 2, 1, 0 are 3, 0. Thus the 
complementary indices with respect to the highest index 4 in the corresponding 
adj*” (a) have labels 1, 4 in reverse order. Hence, 

lo: oy 0 
| — ids do 


ids oy 
os 


as — 1; | 
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6. The mathematical expectations of the e.s.f.’s. From the property (5.8) 
and the theorem in Section 2, the inverse derivation of the moments of the first 
e.s.f. of s roots may now be extended to any e.s.f. As an illustration, take the 
second moment of the second e.s.f. for the case of s = 3. 

If the arguments in z’s of the ¢’s and a’s of Section 5 are now replaced by 
arguments in 6’s, then ¢; — #; and a; — V$” = V;, say, with superscripts 
omitted if the meaning is clear from the context. The second moment suggests 
taking the V-determinant with V2 , V2 in the principal diagonal so that the corre- 
sponding &-determinant is of order 3 (since s = 3). Thus 

| V. _ Vi + 
(6.1) |= V2; — Viv; 
\-V; Vs 


which suggests to add to (6.1) a V-determinant with V,, V; in the principal 
diagonal. Thus, we have only 
V2 =—Vs Vs 
(6.2) | : = V 
| J 4 —7 Vi 


since V, = 0 in the case of s = 3. Using (5.8) the equivalent ®-determinants of 
the V-determinants are 


i> & O| |% & O 


(6.3) % &% O|+ | OB & 
, & BD % 2 9, 


on noting that the subscripts (in reverse order) of the last rows of the V-deter- 
minants form a bicomplementary set with respect to the highest index 4 with 
the missing subscripts in the last rows of the @-determinants. 

The next obvious step to find E[( 2) is to employ directly the method of 
Section 4 on (6.3). This gives 


E{(V$’)*] = ¢(3, m, n)[---[ {U’(4, 3,0) + U'(4, 2,1) {TT (1 — 0)" ox} 


= c(3, m, n)[U(4, 3,0) + U(4, 2, 1)]. 


The author is presently tabulating the lower-order moments of V{", i = 2, 3, 4 
and s = 2,3, 4 for values of 2m = —1(1)10(10)60(20)120 and 2n = 10(10)200. 
The results will be reported at some future time. 


7. Acknowledgment. I am indebted to the referee for pointing out an error 
in the proof of the theorem (Section 2) in an earlier draft and for his other useful 
criticisms and suggestions. I am also indebted to Professor William Kruskal for 
his helpful comments while this paper was in preparation. 
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VARIANCE COMPONENTS IN THE UNBALANCED 2-WAY NESTED 
CLASSIFICATION 


By 8S. R. SEARLE 
New Zealand Dairy Board, Wellington, New Zealand 


Introduction. Sampling variances of estimates of components of variance 
obtained from data that are balanced (having the same number of observations 
in all subclasses) are easily derived because the mean squares in the analysis of 
variance are independent and distributed as x’. The variance component esti- 
mates are linear functions of the mean squares and their variances can be derived 
accordingly, although their distributions are, in general, unknown. When the 
data are not balanced, however, and there are unequal numbers of observations 
in the subclasses the mean squares are no longer independent and they do not 
have x’-distributions. Methods of deriving expressions for the sampling variances 
of the variance component estimates are developed for these situations in an 
earlier paper [3] and applied to the 1-way classification. A second paper [4] gives 
these expressions for the 2-way factorial classification, and extension to the 
2-way hierarchical (nested) classification is presented here. 


Model and analysis of variance. The earlier work discussed sampling variances 
of variance component estimates obtained by Henderson’s Method 1 [2] from 


data having unequal subclass numbers, based on the completely random model, 
namely Eisenhart’s Model II, [1]. The same situation is considered here for the 
2-way nested classification. 

The linear model for an observation x;;, is taken as 


Lik = Bw + a + Bi; + Cijk 


where u is the general mean, a; is the effect due to the ith main classification, 8;; 
is the effect due to the jth sub-class within the ith main classification, and e;j 
is the residual error term peculiar to 2;;. We suppose the number of classes in 
the main classification is a, so that i = 1, --- , a; and that there are c; sub-classes 
within each of these so that 7 = 1, --- , c; . The total number of such sub-classes 
will be represented by b, giving b = > -$_,c;. The number of observations in 
the jth subclass of the ith class is taken as n;;. All terms of the model (except 
uw) are assumed to be normally distributed random variables with zero means 
and variances a, , ¢3 and o. . These are the variance components to be estimated, 
along with the sampling variances of their estimates. 


Received December 4, 1960; revised June 30, 1961. 
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The customary analysis of variance can be written as 


Analysis of Variance 


Term d.f. Sums of Squares 


T; 
T 7 's 
To 
To Ty 


Between main classes 
Between subclasses within main classes 
Within subclasses 


Total 


where, with the customary notation for totals and means, namely 


~~ = . 
t.. = > do xin, Nn; = 7G; and Z;.. = 2;../Nj 
j k I 


we have the uncorrected sums of squares 


T, = >, n;.83.., 
. 

Ta = e a Ni jij. ’ 
oS 

n= CELA 
ae. 


and 
T; = NZ..., 


N being the total number of observations, N = Dt Dud Mid « 
The variance components can be estimated by equating each line of the above 
analysis of variance (except that for ‘‘total’’) to its expected value. Denoting 
. . 2 a2 42 > > “or 
the resulting estimates as ¢, , é3 and &, , the equations for obtaining them are, 
as given in Section 10.17 of [5] 
1 aa car Pe a2 a2 
re i = (N _ ta + (kip — ks )ée + (a — 1)é 
(1) —T, = (N — kyw)és + (b — a)é 
T. —-Ta = (N — b)é. 


The k’s are functions of the n;;’s, namely 


ky = > n°. / N 
k, = z - ni’; N 
i j 


ky» = > (> n3;)/n . 
i j 


The notation here follows that used previously in 





UNBALANCED 2-WAY NESTED CLASSIFICATION 1163 


Variances and covariances required. The within sub-classes sum of squares, 


T, — Tw», has a x’-distribution with (N — b) degrees of freedom and hence 
the variance of & is 


‘ 42 es . 
(2) var (6.) = 20./(N — b). 


Furthermore, 7’, — 7.» is distributed independently of T, , T., and T; so that 
. | a2 ° a2 . ° 
the covariances of ¢, and é with é, are obtained directly as 


9. as ae aS / 
(3) cov (63, 6) = —(b — a) var (6.)/(N — ky) 


and 
cov (6% , 6) = —[(kiz — ks) cov (63 , 68) + (a — 1) var (62)]/(N — ky) 
(4) = [ (ki - ks)(b — @) (N o ky) —_ (a = 1)] var (62)/(N - ky). 


2 ° » a a . - 2 
This independence property is also used for obtaining the variances of é, and 
2 ° ° ° 2\ 

é3 and the covariance between them as linear functions of var (é,) and the 
variances and covariances of T, , T., and T,;. Thus 


T.) + (b — a)’ var (6?) 
—- 


2 var(T. — 
vel) 0 =. 


(N — ki)?(N — ky)’ var (6%) 
= var [( N— ks)T. — (kip — ks)Tas — (N — ky2)T 5) 
+ [(N — ks)a — (kw — kg)b — (N — 2) ] var (6%) 


(N — k,)(N — ky) cov (é, , 63) = 
(7) cov (T, — T;)(Ta — Ta) + (a — 1)(b — a) var (6) 
— (N — ky) (ki — ks) var (63). 


The second term in each of these expressions can be obtained from equation (2) 
and the first can be found as a linear function of the variances and covariances 
of T, , T., and T, . These we now proceed to find. 


Matrix methods. The sampling variance of a quadratic function, x’Fx, of 
normally-distributed random variables represented by the vector x is 2tr( VF)’ 
where V is the variance-covariance matrix appropriate to the variables in x. 
The covariance between two quadratics x’Fx and x’Gx is 2tr(VFVG). These 
results can be applied to obtain the terms needed for equations (5) through (7) 
using matrices similar to those employed in Searle, [4]. First we define square 
matrices U;; , U;. and Uy of order n;; , n;. and N respectively, with all elements 
equal to one. Square matrices of order N with U-matrices in the diagonal and 
zeros elsewhere are defined as D-matrices; thus D,, has the matrices U;,; in its 
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diagonal, for all values of 7 and 7, and D, has the matrices U;. in its diagonal, for 
all values of 7. The variance-covariance matrix of the N observations arrayed 
in order k = 1 --- n,; within j-classes within each 7-class can now be expressed as 
, 2 2 2 
(8) J _ TaDa + osDap + oel, 
I being an identity matrix. 
a ’ oi ° ° —lry 
Defining C., and C, similar to D,, and D, only with matrices (n;;) Ui; and 


lyr . ° ° ‘ . ° ° 
(n;.)” U;. in the diagonal enables the quadratics in the analysis of variance 
to be written as 


T _ x’C.x 


T ab = x’( aX 


T, = PU gx. 


2 


var (T,) = 2 tr (VC,)° 
= 2tr (oaD. + o5DasCa + o2Ca)° 
after substitution from (8). This is a quadratic in the variance components 


which can be expanded, through the special form of the matrices, in terms of 
the n,,;’s using the expressions 


= » Do nis ¢ D (de ni;)/n. 
s 7 + I 
(denis)? /n. hr = DO (Do nis)*/ni. 
s J i 7 
De mi-( Qe nis) = Dini. 
t I i 


Thus 
var (T..) = 2(Nkyio*, + kro§ + act + 2Nk30°,03 + 2Noio, + Qkys090.). 


A similar procedure for the other terms in (5), (6) and (7) leads to the following 
results: 


var (T.») = 2tr (VCa)° 
var (T.) + 2[(Nk; — kr)o3 + (b — a)oe + 2(N — kis) 050%] 
var (T;) = 2tr (VUy)?/N’ 
2(kyo%, + kso3 + 02)° 
cov (7.7.3) = 2tr (VC.VCw) 
var (T.) + 2(ks — kz)o3 
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cov (T.7;) = 2 tr (VC.VUy)’?/N 
= 2A(ke/N)o~ + (ke/N)o3 + o3 
+ 2(ks/N )o203 + 2kyo%.02 + 2ko50°%] 
cov (Tals) = 2 tr (VCaVUnx)/N 
= cov (T.T's) + 203(ks — ke)/N. 


Results. Substituting the above expressions into (5) leads, after simplifica- 


tion, to 


; 4 4 woe ee a 2 2 
201 Ta + 2 op + As oe + 2X4 Oa O89 + 2X5 oe oe + 2rg a8 oe) 


var (é,) = — — (N= k)P(N — ki)? 


where 
4 = (N — ky)*[ki(N + ki) — 2ko/N], 
he = kelN (kis — kg)? + kg(N — kyz)*] + (N — ke)*ky 
— 2(N — ks)[(ki2g — ks)kg + (N — ky) ke/N] 
+ 2(N — ky) (ki, — ks)ka/N, 
As = [(N — ky)?(N — 1)(a — 1) — (N — ky)?(a — 1) (0 — 2) 
+ (ky — ks)?(N — 1)(b — a)]/(N — 5), 
= (N — ky)*[kg(N + ki) — 2ke/N], 
1, = (N — ku)*(N — ki), 


Ne = (N — ky) (N — kz) (hie — ke). 


Similarly, expression (6) becomes 
an 
var (63) 


2(kr + Nks — 2ks)o3 + 4(N — kw)op o¢ + 2(b — a)(N — a)oe/(N — b) 


a (N — ku)? 
and (7) reduces to 
(N —ki)(N — kz) cov (6263) = [ks — kr + (ke — ka) /N oe 
+ 2(a — 1)(b — a)ot/(N — b) — (N — kya) (kya — ks) var (65). 


nm : : ; : 2 3 
These variances are in terms of the unknown variance components o, , og and 
a, so that estimation of the variances in any particular case is only possible by 
replacing the components in these formulae by their estimates. 
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Balanced data. The above formulae reduce to the well-known results for 
balanced data when all the n,; are put equal, to n, say. Suppose that all levels 
of the main classification have c sub-classes so that b = ac. Then, for example, 


2(an® + acn* _ 2an*) og + 4an(c —l)opoe+ 2Qa(e — 1)03/ac(n — 1) 
a’n?(c — 1)? 





var(é3) = 





which reduces to 

i nos + 03)” o: 

wane) = 5 | arto + cen — 1) 1° 
This is the result obtained directly for the balanced case when T.., — T. and 
T, — T.» are distributed independently as x° with a(c — 1) and ac(n — 1) 
degrees of freedom respectively. Their expectations, obtained from equation 
(1), are 

E(Tws — Ta) = a(e — 1)(nog + a2) 

and 


E(T, — Tas) = ac(n — 1)os 


and their variances equal twice the square of their expectations divided by their 
° . 2 
degrees of freedom. The variance of the estimate of o3 , namely 


aa 1 E i ea A er rs | 
Ge = a — 1. — ’ 


a(c — 1) ac(n — 1) 
is accordingly as shown above. 
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SOME MAIN-EFFECT PLANS AND ORTHOGONAL 
ARRAYS OF STRENGTH TWO! 


By Sripney ADDELMAN? AND Oscar KEMPTHORNE 
Iowa State University 


1. Summary. In this paper we present a method of constructing main-effect 
plans for symmetrical factorial experiments which can accommodate up to 
[2(s" — 1)/(s — 1) — 1] factors, each at s = p™ levels, where p is a prime, with 
2s" treatment combinations. As main-effect plans are orthogonal arrays of 
strength two the method presented permits the construction of the orthogonal 
arrays (2s", 2[s" — 1]/[s — 1] — 1, s, 2). 


2. Introduction. Let there be k factors each of which can assume s = p” levels, 
where p is a prime number. An orthogonal array of strength d, of size N, with k 
constraints and s levels consists of a subset of N treatment combinations from 
an s* factorial experiment with the property that all s* treatment combinations 
corresponding to any d factors chosen from the k occur an equal number of times 
in the subset. The array may be denoted by (N, k, s, d). 

The concept of orthogonal arrays was first introduced by Rao [1]. He dis- 
cussed the use of these arrays as fractionally replicated plans for symmetrical 
factorial experiments which permit the estimation of main-effects and inter- 
actions up to order (¢d — 2) when higher order interactions are negligible. 

The plans for fractionally replicated symmetrical factorial experiments which 
are developed in this paper are orthogonal arrays of strength two. We call these 
plans main-effect plans because they permit orthogonal estimation of all the 
main-effects when the interactions are negligible. 

The main-effect plans derivable from the system of confounding developed by 
Fisher [2] can be represented by the orthogonal arrays (s", (s" — 1)/ 
(s — 1), s, 2). These plans fall within the class of optimum multifactorial de- 
signs which were considered by Plackett and Burman [3]. 

It has been shown by Bose [4] that the maximum number of factors that it is 
possible to accommodate in a symmetrical factorial experiment in which each 
factor occurs at s = p” levels and each block is of size s", without confounding 
any d-factor or lower order interaction, is given by the maximum number of points 
that it is possible to choose in the finite projective geometry PG (n — 1, p™) 
so that no d of the chosen points are conjoint. This is equivalent to showing 
that the maximum number of constraints k in the orthogonal array (s”, k, s, d) 
is given by the maximum number of points it is possible to choose in 
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PG (n — 1, p”) so that no d of the chosen points are conjoint. Clearly the 
maximum number of constraints in the orthogonal array (s", k, s, 2) is equal to 
the number of points of PG(n — 1, p”). Thus the maximum value of k is 
(s" — 1)/(s — 1). These facts are relevant in view of the method of construction 
to be presented. 


3. Preliminary notation and lemmas. The finite projective geometry 
PG (n — 1, p”) isa geometrical representation of n factors each at s = p” levels 
and their generalized interactions. We shall represent these n factors by 
X,,X2,--: ,X, and their generalized interactions by k:X, + k2X2+ +--+ + kaXa 
where the k; can take on any value of the Galois field GF (p™) and it is under- 
stood that the coefficient of the first factor appearing in an interaction is unity. 

Let wp, Ui, °°", Us. represent the elements of GF (p”) and let us, ui , 

er represent the squares of the elements of GF (p”). We shall denote the 
set of squared elements of GF (p”) by GF” (p”). It is easily verified that apart 
from the 0 element the set GF’ (p”) forms a cyclic Abelian group under multi- 
plication. It follows from the cyclic property that (i) when p = 2, GF’ (p”) 
contains each of the elements of GF (p”) and (ii) when p is an odd prime, the 
elements of GF’ (p™) comprise a subset of 3(s + 1) distinct elements of GF (p”), 
where one element occurs once and 3(s — 1) elements are duplicated. 

Consider one of the factors X; in a main-effect plan in which each X; has s 
levels, each occurring s” times in a total of s” treatment combinations. Let 
X? be a pseudo-factor obtained by squaring the levels of X; . We now present the 
following lemmas: 

Lemma 1. When p is an odd prime, Xi + kX; (k an element of GF (p”™)) con- 
tains 4(s + 1) distinct levels, one level occurring s" times and 4(s — 1) levels 
occurring 2s" —' times in s" treatment combinations. 

Lemma 2. When p = 2, X; contains each of the s levels s"* times. 

Lemma 3. When p = 2, Xi + kX; (k any element of GF (p”) except 0) contains 
43 distinct levels each occurring 2s" ‘ times. 

Lemma 3 can be proved as follows. Let x; range over the elements of GF (p”) 
which represent the s levels of X; . As x; ranges over the elements of the field so 
does x; + k where k is an element of GF (p”). Also if x; + k = x; (mod 2) then 
2; + k = x; (mod 2). Hence 2,(z; + k) = 2,2; and z;(x; + k) = xa;. Thus 
whatever values of x;(x; + k) are achieved they are achieved for at least two 
values of x; . 

It will now be shown that the values of x;(2; + k) are achieved for exactly 
two values of x; . Let y be the generator of the field and let x; = y* andk = y’. 
Thus z;(1; + k) = y*(y* + y’). Suppose that 


y(y* +y) = yy" + 9’) 


where 


yxy" and y +7 Hy’. 
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Hence 
(yytyy= (yy +yy 
(y*+y)y+ (iy +y)y¥ =0 
(y>+y)(y*t+y +y’) =0. 


This implies that either y* + y” = 0 and therefore y* = y” which is a contradic- 
tion or that y* + y’ + y° = O and therefore y* + y° = y’” which is a contradic- 
tion. Hence the values of x;(z; + k) are achieved for exactly two values of x; and 
Lemma 3 is proved. 

Lemma 4. The factor represented by Xi + kiXi + p ee k;X;, (ky and k; ele- 
ments of GF (p™)) where at least one k; # 0, contains each of the s levels 8" times. 

LemMa 5. The levels of Xi + kX; + kX; which occur in a plan with the u; 
level of a,X; + a2Xj; , where k; , ke , a; and a are elements of GF (p™) and a, ¥ 0 
are given by the values of ae + kz; + koa; + c(a;%; + aexr;) — cuz where ke + cay 
= 0 and x; ranges over the elements of GF (p”). 

Proor. When a,X; + a2X ; takes on the u; level then a,x; + aor; = u, and thus 


Lj = (Ue — ayx;)/ae. 


Hence the levels of the factor X; + k:X; + kX; which occur with the level u, 
of a,X; + aX; can be represented by 


x: + kz; + kext; = x or ky; + ko( ut — a,x; ) /d2 
a ai + (ki — keay/a2)x; + (ke/ae)u: 


Since k. + ca, = 0, thence = —k,/a, . Thus 


ar + (ky — keay/ae)a; + (ke/ae)ur = xi + kay + hers + c(ayx; + aer;) — cur, 


and the lemma is proved. 

Two factors X; and X; are said to be orthogonal to each other if each level of 
X; occurs the same number of times with every level of X; . Two factors X; and 
X,; are said to be semi-orthogonal to each other if (i) for p an odd prime, one 
level of X ; occurs s”” times and 3(s — 1) levels of X; each occur 2s” ” times with 
each level of X; and (ii) for p = 2, 3s levels of X; each occur 2s" times with 
each level of X;. 

It follows from Lemmas 1, 3, and 5 that when p is an odd prime or when 
ky — kea;/a, ¥ 0, then aX; + a2X; is semi-orthogonal to X} + kX; + kX;. 
It follows from Lemmas 2 and 5 that when p = 2 and k; — kea,/a, = 0 then 
a,X; + aX ; is orthogonal to X; + k,X; + kX; . Employing an argument similar 
to that used in Lemma 5 it can be deduced that kX? + k,X;+ X; and kX? + 
koX; + X; are orthogonal to each other when k; # ke. 

Lemma 5 can be generalized to include more than two factors as stated in 
Lemma 5a. 
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Lemma 5a. The levels of X? + kiX; + D552: kj;Xj which occur in a plan with 
the uz level of axX; + Peis a;X ; are given by the values of 


a+ kay t 2 kya; + clay; + a ajvj;) — CUt 
j¥i ji 
where k; + ca; = O for all j ¥ i. If the a; and the k; are not of such a form that 
k; + ca; = 0 for all 7 ¥ i and some c contained in GF (p”™) then the two factors 
are orthogonal. 

Lemma 6. When p is a prime the complements in GF (p™) to the elements in 
GF*(p”) are the set of elements in GF’(p”™) each multiplied by an element of GF (p™) 
which is not an element of GF’(p”). If the set of elements in GF*(p™) and their 
set of complements are taken together in one set, the elements of GF(p”™) are obtained. 

Proor. From abstract group theory (see Birkhoff and Mac Lane [5]) we employ 
a lemma which states that two right cosets of a subgroup are either identical or 
without common elements. Now the elements of GF’(p”) form an Abelian sub- 
group of the elements of GF(p”). Hence multiplying each element of GF’(p”) 
by an element of GF(p”) which is not an element of GF’(p™) yields the com- 
plementary set to GF’(p”). 

It is clear from Lemma 2 that when p = 2 the set complementary to GF’(p”) 
s the null set. 


4. Construction of main-effect plans. 


THEOREM |. There exists a main-effect plan for [2(s" — 1)/(s — 1) — 1] factors, 
each at s = p”™ levels, with 2s” treatment combinations. 

Proor. In order to facilitate the presentation of the proof of Theorem 1, let 
n = 2. First construct an orthogonal main effect plan for (s’ — 1)/(s — 1) 
factors each at s levels in s° trials, represented by the two factors X, and X» and 
their generalized interactions X,; + X>2, X; + 2X2,---, Xi + (s — 1)Xe, 
where the coefficients 1, 2, --- , (s — 1) are elements of GF(p”), addition and 
multiplication being performed within this field. To these add 


[(# —1)/(s —1) — 1] 
factors represented by 
Xi + X2,Xi+ Xi+ X2.,Xi+2Xi+ X,---,Xi+(s—1)Xi+ X. 


These [2(s" — 1)/(s — 1) — 1] factors in s” observations represent the first 
half of the main-effect plan. 

Note from the preceeding lemmas that when p is a prime number, X; + a:X» 
and Xj + k:X; + X2 are semi-orthogonal and also that X2, and Xj + k:X; + X2 
are semi-orthogonal for all a; and k; in GF (p”) except a; = 0. All other pairs 
of factors are clearly orthogonal. If p = 2 and (k; — a;/a;) = 0, then a,X, + 
a;X, and Xj + k:X; + X: are orthogonal. 

The second half of the plan is chosen so that the pairs of factors which are 
orthogonal in the first half are also orthogonal in the second half and pairs of 
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factors which are semi-orthogonal in the first half are semi-orthogonal in a 
complementary manner in the second half. The factors in the second half which 
correspond to the factors of the first half can be denoted by 


Xi »Xo,Xi t+ Xeo+b,,X1+2X2+ he, “e? »Ait+ (s— 1)Xo+ bo4, 
kX} + X2,kXi+ hXi+ X+ ea, 


kX} + keoX, + X2+02,°°° »kXY + Key X1 + X2+ G4 


where the coefficients b; , be, --- , Des, k, ki, ke, +++, hey ai,a@,-* 
which are to be determined, are elements of GF(p”). 

From Lemma 5, it is seen that the levels of Xi + X-2 which occur with the 
uz level of X2 are given by the values of xi + u, where 2, takes on the values of 
the elements of GF (p™). Without loss of generality we may let u; = wo = 0. 
When p is an odd prime, the values of kX] + X2, where k is an element of 
GF (p”) but not an element of GF’ (p”), which occur with the u; = 0 level of 
X> are given by the values of kai . As shown in Lemma 6, kz} complements x}. 

Thus, when 7 is an odd prime k can take on the value of any element in GF(p”) 
which is not an element of GF’(p™). If p = 2 it is clear from Lemma 2 that k = 1. 

A method for determining the constants b; , bo, «++ , Det, ki, ke, +++, Kea, 
C1, €2,°** ,Cs-1, When s = p” and p is an odd prime is now presented. In order 
that the levels of kX7 + X. which occur with the 0 level of X; + a:X2 + b; be 
the complements of the levels of Xi + X2 which occur with the 0 levels of X, + 
a;X2 , b; must be such that the values which kaj — (1/a;)a, — b;/a; takes when 
a, ranges over the field GF (p™) complements the values which zj — (1/a;)a, 
takes. Now x} — (1/a;)2; consists of one element of GF (p”) occurring once 
and 4(s — 1) elements occurring twice. Let the unique element of GF (p”) 
be w;. Then x} — (1/a;)a,; = wu must have only one solution as x, ranges over 
the elements of GF (p”). Thus 1/aj + 4u, = 0 and hence 4u, = — 1/a? . Since 
kei — (1/a;)a, — b,;/a; must complement xj — (1/a;)x,, the equation 


» Ce-19 


kai — (1/a;)a, — bi/a; = wy 
must also have only one solution. Therefore 

1/ai + 4k(b;/a; + wm) = 0. 
Substituting 4u, = —1/a; in this equation and solving for b; we get 
(1) b; = (k — 1)/4ka;. 


To find the levels of Xj + d:X, + X2 which occur with the 0 levels of X2 note 
that there exists an element of GF (p™), us say, such that 2} + diz; = uw has 
only one solution. 

Thus d? + 4u. = 0 and hence 4u. = —d? . In order that the levels of kX} + 
k,X, + X2 + ec; which occur with the 0 levels of X. complement those given by 
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x; + dix, , then kaj + kia, + c; = Ue. must have only one solution. Substituting 
4u. = —d; in this equation and solving for c; we get 


(2) c; = k?/4k — d?/4. 


To find the levels of Xj + d:X, + X2 which occur with the 0 levels of X, + 
a,X_. note that there exists an element of GF (p”), us; say, such that rit 
(d; — 1/a;)a, = uz has only one solution. Thus 


(d; — 1/a;)’ + 4u; = O and 4u; = —(d; — 1/a,)’. 


x: 2 2 
Since ka; + (ki — 1/a;)a, + (¢; — bi/a;) must complement 2; + (d; — 1/a;)a, , 
the equation 


kaj + (ki — 1/a;)x, + (ce; — b:/a;) = us 


must also have only one solution as x; ranges over the elements of GF (p”). 
Therefore 


(k; — 1/a;)? — 4k[(e; — b;/a;) — us] = 0. 


Substituting 4u; = —(d; — 1/a;)’ and equations (1) and (2) into this equation 
we get 


(3) k; = kd;. 
Hence equation (2) can be rewritten as 
(4) c; = dii(k — 1). 


Thus k is determined by choosing an element of GF (p”) which is not an 
element of GF (p”). By letting a; = 1, 2,---, s — 1 we can deter- 


: ; 
mine }, , bo, --- , 6.1 from equation (1). Then setting d; = 1, 2,---,s — 1 
we determine k, , k2,--- , ks. from equation (3) and c,, ¢,-+-- , ¢—1 from 
equation (4). 

The procedure employed above cannot be applied when p = 2 since x} + ca, 
consists of 4s elements of GF (2”), each occurring twice. Thus there exists no 
element u such that x} + cr, = u must have only one solution. 

We deduce from Lemma 2 that when p = 2, then k = 1. In order that the 

levelsof Xj + X2 which occur with the0 levels of X; + a;X2 + b,(a; = 1,2, 3, 
s — 1) complement the levels of X} + X» which occur with the 0 levels of X, + 
a;X» then the levels given by 2} — (1/a;)a, — b;/a; must complement the levels 
given by 21 — (1/a;)x, when x; ranges over GF (2”). It is easily verified that 
b; can be any one of the 2”” elements of GF (2”) which are not given by 
n= (1/a;)a, . 

In order that the levels of Xj + k:X; + X» + ¢; which occur with the 0 levels 
of X, complement the levels of Xi + d.X,; + Xe which occur with the 0 levels of 
X, , then the values given by ai + ki, + c; must complement the values given 
by xz; + dja, . It can be shown that k; = d; and ¢; can be any one of the —* 
elements of GF (2”) which are not given by the values of x} + dja; . 
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By finding the values of Xj + kX, + Xe + c; which occur with the 0 levels 
of X; + «:X2 + b; and which complement the values of Xi + dX; + X- that 
occur with the 0 levels of X; + a,;Xe2 a set of b; and c; which satisfy all the re- 
quirements to have the second half of the plan complement the first half of the 
plan can be determined. 

When n > 2 the same procedures will yield the desired plans if Lemma 5a 
is utilized in place of Lemma 5. Thus the theorem is proved. 


5. Examples. Some of the more useful 7. ae arrays which can +" con- 
structed by the eT, eo are: (18, 7, 3, 2), (54, 25, 3, 2), (32, 9, 4, 2), 
(128, 41, 4, 2), (50, 2), (250, 61, 5, 2), (98, 15, 7, 2), (128, 17, 8, 2 ) and 
(162, 19, 9, 2). 

Bose and Bush [6] have constructed the arrays (18, 7, 3, 2) and (32, 9, 4, 
by other procedures and have shown that [2(s" — 1)/(s — 1) — 1] is the maxi- 
mum number of constraints that arrays of size 2s" can accommodate. 

We now present two examples of the construction of main effect plans for 
[2(s" — 1)/(s — 1) — 1] factors each at s = p”™ levels with 2s" treatment com- 
binations. The first example illustrates the construction of a plan for eleven 
factors, each at five levels with fifty treatment combinations. This plan is the 
orthogonal array (50, 11, 5, 2). 

The eleven factors which represent the first eet cae treatment combina- 
tions are denoted by X,, X2, Xi: + X2, Xi + 2X2, Xi .T 3X2, X1 + 4Xo, 
Xi + Xe, Xi + Xi + Xs, Xi + 2X. + X2, Xi + 3Xi + X and 
Xi + 4X, + X.. The corresponding eleven factors representing the sec- 
ond half of the plan are denoted by X, , X2, X1 + X2 + b1, X1 + 2X2 + lo, 
Xi + 3X2 + bs, Xi + 4X2 + by, KXi + X2, KXi + Xi + X2 + 4, 
kX} + heX,+ Xe+e2, kX} + k3Xy + X.+ ¢; and kX} + kX, + Xe+c%. 

The elements of GF (5) are 0, 1, 2, 3 and 4. Hence the elements of GF’(5) 
are 0, 1, 4,4, 1. From Lemma 6, therefore, k = 2 or k = 3. Let us choose k = 3. 
Hence, from equation (1) 


= 1/a; ° 
Thus, when 


then 0b; 
then b = 
then 6; 
4 then 
Now, from equations and (4) 
= 3d, and c; = 3d. 
Thus, when 


then kj = 3,4 = 3 
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d; = 2, then ks » = 2 
d; = 3, then k; 2 
d;= 4, then k, = 2, 3. 


The eleven factor representations for the second half of the plan are therefore 
given by: X;, X2, Xi + Xo + 1, Xi: + 2X2 + 3, Xi + 3X2 + 2, Xi + 4X2 + 4, 
3Xi + X2,3Xi + 3X, + X2+ 3, 3X1 + Xi + X2 + 2,3Xi+4X1+ X2.4+2 
and 3X; + 2X, + X: + 3. 

The second example will illustrate the construction of the plan for nine factors 
each at four levels with thirty-two treatment combinations. This plan is the 
orthogonal array (32, 9, 4, 2). 

The nine factors which represent the first sixteen treatment combinations 
are denoted by X; , X2, Xi + X2, X1 + 2X2, Xi + 3X2, Xi + Xo, 

Xi + X, + X2, Xi + 2X, + X. and Xj + 3X, + X;. The corresponding nine 
factors representing the second half of the plan are denoted by Xi, X2, 

Xi + X2 +b, Xi + 2X2 + by Xi + 3X2 + b3, Xi + X2,.Xi+hXi+ Xs+a, 
Xi + keX, + Xo + co and Xj + kX, + X. + c,. The coefficients are elements 
of GF (2°) and all additions and multiplications are performed within this field. 

Solving for c,; so that the levels of Xi + X: which occur with the 0 level of 
X, + X2 + b; complements the levels of Xi + X. which occur with the 0 level 
of X, + X2 we find that b; = 2 or b, = 3. Similarly we find that b. = 1 or 
be = 2 and that b, = 1 or b; = 3. 

As we wish the levels of Xj + k:X, + X2 + c; which occur with the 0 level 
of X, + a;X_ to be complements to the levels of Xi + k:X, + X2 which occur 
with the 0 level of X,; + a;X>o we find that 


k, = 1, 
by aa Co = l or B- b, + C3 = l or 2. 3be + qQ = 1 or 4 


3be + co = 2 or 3, 2b; +c, = lor3 and 2b; + c; = 2 or 3. 


Values of b; , be , bs , c1 , C2 and c; which are consistent with all the above equa- 


tions are b; = c; = 2, by Co = 1 and bs; = cs; = 3. A second set of solutions is 
b,} = ¢; = 3, bo = co = 2 and bs = cz; = 1. These are the only two possible sets 
of solutions for this plan. 

Since the coefficients satisfy all the properties required to make the second 


half of the plan complement the first half every pair of factors is orthogonal. 


6. Some useful orthogonal arrays. In this final section we present the factors 
which represent the first and second halves of the arrays (18, 7, 3, 2) and (54, 
25, 3, 2) and the treatment combinations which constitute the array (50, 11, 5, 2). 


The factors representing the first half of the orthogonal array (18, 7, 3, 2) are: 


X,,X2,Xi+ X2,Xi1+ 2X2, Xi+ X2, Xi + Xi+ X.,Xi+2Xi4+ Xe. 
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The factors representing the second half of this array are: 
Xi, X2,Xi + X2 +2, Xi, + 2X. + 1, 2X5 + X2, 2Xi + 2X1 + X2 +1, 
2Xi+ Xi+ X2+ 1. 
The factors representing the first half of the orthogonal array (54, 25, 3,2) are: 
X1,X2,X1 + Xo, Xi + 2X2, Xz, Xi + Xz, Xi + 2Xy, Xo + Xz, X2 + 2X, 
Xi + Xo + Xs, Xi + Xo + 2X3, Xi + 2X2 + Xz, Xi + ZX. + 2X;, 
Xi + X2, Xi + Xi + X2., Xi + 2X + X2, Xi+ Xs, Xi+ X14+ X;, 
Xi + 2X,+ X3,Xi+ X2.+ Xs, Xi+ Xit+ Xo+ Xs, Xi+2Xit+ Xe + Xs, 
Xi + X. + 2X3, Xi + Xi + Xe + 2X3, Xi + 2X1 + X2 + 2Xz. 


The factors representing the second half of this array are: 


X1,X2,X1 + Xo + 2, Xi + 2X2 4+ 1, X3,X1 + Xs + 2, Xi + 2X3 +: 1, 
X2 + X3, X2 + 2X2, X1 + X2 + Xs + 2, X1 + Xe + 2X + 2, 
X, + 2X. + X;+ 1, Xi; + 2X2 + 2X; 4+ 1, 
2Xi + X2, 2Xi + 2X1 + Xo + 1, 2X5 + Xi + X2 +:1, 2X5 + Xs, 
2Xj + 2X, + X; + 1,2Xi + Xi + X;+ 1, 2Xi+ X.4+ X:, 
2Xi + 2X. + X2 + Xe 4+ 1, 2X5 + Xi + Xo + Xa + 1, 2Xi + X2 + 2X3, 


2Xi + 2X, + Xo + 2X3 + 1, 2Xi + Xi + Xo + 2X3 + 12. 


The factors representing the orthogonal array (50, 11, 5, 2) were deduced in 
Section 5. The following fifty treatment combinations constitute a main-effect 
plan for eleven factors each at five levels and the array (50, 11, 5, 2). The treat- 
ment combinations are divided into two sets of twenty-five, the first set being 
the first half of the plan and the second set being the second half of the plan 
(see Table 1 following references). 
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ON A GEOMETRICAL METHOD OF CONSTRUCTION OF PARTIALLY 
BALANCED DESIGNS WITH TWO ASSOCIATE CLASSES 


By EstHer SEIDEN 
Michigan State University 


1. Introduction. The method of construction of partially balanced block 
designs discussed here is different than the ones known in the literature. It is 
based on the existence of an oval (maximum number of points no three on one 
line) in finite Desarguesian planes. This method can be applied to any plane 
of order 2”, h a positive integer, i.e., to planes with 2" + 1 points on a line. No 
general procedure has thus far been obtained for planes with p” + 1 points 
on a line, p an odd prime and n a positive integer. A design based on a plane 
with 10 points on a line will be constructed. Further generalizations of this 
method will be discussed later. 


2. Construction of partially balanced designs based on finite Desarguesian 
planes with 2: + 1 points on a line. The total number of points in a plane with 
2" + 1 points on a line is 2” + 2" + 1. Furthermore it is well known that such 
planes include ovals consisting of the maximum possible number of points 
2" + 2. The lines of the plane can be classified into two categories with respect 
to the oval. The first category inciudes lines having two points of the oval, hence- 
forth called secants. The second category of lines consists of lines not including 
any point of the oval. The number of lines belonging to each of the two cate- 
gories is clearly (2"" + 1) (2 + 1) and 2” — 2"” respectively. Consider 
now the points of the plane which are not on the oval. Their number is 2” — 1. 
Each of them lies on 2"* + 1 secants and 2"™ lines of the second category. This 
leads to a construction of partially balanced block designs identifying the points 
with the objects and the lines with the blocks. Each of the two categories of 
lines taken separately gives rise to a partially balanced block design. The first 
design will be obtained by calling two objects first associates if the points repre- 
senting them lie on one secant, second associates otherwise. The second design 
will be obtained by interchanging the roles of the two categories of lines. 

The parameters of the first design are as follows: 


b = (2""' 4+ 1)(2* + 1), 
r= 2t4y), 


g2h-l 


9 


9 


g2h— 
gph 


The parameters of the second design obtained by identifying the blocks with 
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the lines not including points of the oval are 


oh o2h-1 oil 
p= 2” —1, = 2 '_ 92 


~ , 


Ae = O, 


2h—2 
2 — | 
gh-2 _ 4) 


Let us illustrate the above described method applying it to the case h = 2, i.e., 
to the finite projective plane with 5 points on a line. Take the oval represented 
by the unruled conic «° + yz = 0 and the point of intersection of the tangents 
to this conic, where 2, y, z are the coordinates of a point in the plane. Let « be a 
primitive element of the corresponding Galois field. The six points of the oval 
are then: (1,0,0), (0, 1,0), (0,0,1), (1,1,1), (le, &), (1, &, e). The remaining 
fifteen points of the plane are: (0, 1, 1), (0, 1, €), (0, 1, €), (1, 0, 1), (1, 0,e), 
(1,0, &), (1, 1,0), (1,1, €), (1,1, ), (1, €, 0), (1, ¢ 1), (1, «, €), (1, &, 0), 
(1, € 1), (1, &, &). Thus we can exhibit a partially balanced block design with 
the following parameters: v = 15,b = 15,r = 3,k = 3,4, = 1,42 = 0, m = 6, 


m = 8, P; = c = P, = e . If we identify the points with the consecutive 
numbers 1 through 15 we obtain Table I, the plan of the design in question. 
TABLE I 
The Plan of the Design 


13 ) 9 15 
11 8 9 


14 10 11 12 


1 
l ‘ 
1 ] 5 é 6 10 13 
2 
2 12 13 14 15 


ReMarK 1. The described method of construction of partially balanced block 
designs also gives a method of construction of Hadamard matrices of order 
2°" a method that differs from the one given by Paley [1]. The elements of these 
matrices can be rearranged so that they have a constant element on the diagona! 
and are symmetric about the main diagonal. They have further properties which 
will not be described here. The association matrices of the design [2] yield such 
matrices. Here is an example of such a matrix for h = 2 based on the partially 
balanced block design with blocks represented by the second category of lines. 
The diagonal elements are zeros and the below diagonal part of the matrix is as 
follows in Table II. 

REMARK 2. It seems worthwhile to point out that the method of construction 
of partially balanced block designs was applied to Desarguesian planes only 
because of the fact that they do include an oval consisting of 2" + 2 points. If 
the same would hold with respect to non-Desarguesian planes, then conceivably 
one could obtain, using the same method, designs that are in general non-iso- 
morphic with the ones already constructed. 





CONSTRUCTION OF PARTIALLY BALANCED DESIGNS 


TABLE II 


1 

11 

111 

1111 

10100 

100101 

1000111 
11000111 
010100101 
0010110100 
01001001111 
001101100110 
00011011011: i 
01100100111110 


3. Construction of a partially balanced block design based on planes 
of order p”. No general theory is yet available for construction of designs based 
on projective planes with p” + 1 points on a line, p an odd prime, n a positive 
integer. An indication of the approach will be given by examining the case 
p = 3,n = 2. The total number of points in this plane is 91; the oval consists 
of 10 points. The 81 remaining points can be classified into five categories depend- 
ing on the number of secants passing through the points. It will be convenient 
to name the lines including just one point of the oval tangents. Let us denote 
the number of points in each of the five categories by x, y, z, u, w respectively. 
The classification is summarized in Table III. 


TABLE III 





Lines passing through the point 


Category 7 : . 
Lines not including 


Number of points Sceants Tangents points of the oval 


IV 
Vv 


Clearly x + y +2+u+ w = 81. Further equations are obtained counting the 
number of intersections of tangents, secants, separately and the number of 
those intersections of the secants with the tangents that are not points of the 
oval. This yields the following equations: 


10x + 6y + 32 + u 630, 
y + 62 + 1l5u + 28w 45, 
2y + 32 + 3u+ 2w = 90. 
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It is easy to show that the only possible positive integer solutions of the above 
equations are x = 36, y = 45. This leads to a partially balanced block design 
if we identify, e.g., the objects with the y’s and the blocks with the tangents 
excluding the points of the oval. The parameters of the design are: v = 45, 


b = 10,r = 2,m = 16,m = 2,,4= 1,42 =0,k =9,P,= ee »P, = 


4 19 
(* wf The plan of the design is shown in Table IV. 


o 


TABLE IV 


2 3 5 6 7 8 
10 11 : 13 14 15 16 
10 18 ¢ 20 21 22 23 
il 18 2: 26 27 28 29 
12 19 2: 31 32 33 34 
13 20 26 31 36 37 38 
14 21 : 32 36 40 41 
15 22 28 33 ov 40 43 
16 23 2 34 38 41 43 
9 17 24 30 35 39 42 44 
It may be noticed that this design could be obtained by not using any theory 
but the method will apply to more complicated cases. 


In conclusion I wish to thank R. C. Bose for a stimulating discussion which 
led to this paper. 
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ON SOME METHODS OF CONSTRUCTION OF PARTIALLY BALANCED 
ARRAYS! 


By I. M. CHAKRAVARTI? 


University of North Carolina 


Summary. Partially balanced arrays are generalizations of orthogonal arrays. 
Multifactorial designs derived from partially balanced arrays require a reduced 
number of assemblies in order to accommodate a given number of factors. For 
instance, an orthogonal array of strength two, six symbols and four constraints, 
would require at least 2.6? = 72 assemblies. This is because there does not exist 
a pair of mutually orthogonal Latin Squares of order six. But for the same situa- 
tion, a partially balanced array in 42 assemblies, is constructed in this paper. 
The method of construction is one of composition which utilizes the existence 
of a pairwise partially balanced incomplete block design and an orthogonal array. 


1. Introduction. Suppose A = ((a;;)) isa matrix,i = 1,---,m,j=1,---, 
N and the elements a;; of the matrix are symbols 0, 1, 2, --- , s — 1. Consider 
the s' 1 X ¢ matrices X’ = (a, x2,---, 2) that can be formed by giving 
different values to the z,’s, x; = 0,1,2,---,s — 1;7 = 1, --- t. Suppose asso- 
ciated with each t X 1 matrix X there is a positive integer A(x, %2,--* , 21) 
which is invariant under permutations of (x , 72, --- , 2). If, for every t-rowed 
submatrix of A, the s‘' t X 1 matrices X occur as columns (2; , 2 2% 5 Be) 
times, then the matrix A is called a partially balanced array of strength t in N 
assemblies, m constraints (or factors), s symbols (or levels) and the specified 
\(%1, ®2, ***, 2) parameters. When A(x, 22, -*:, 2%) = A for all 
(a1 ,%2,-°-* , 2%), the array is called an orthogonal array. 

Orthogonal arrays were defined in [6] and [7] and construction of orthogonal 
arrays were considered in [1], [2], [3], [6] and [7]. Partially balanced arrays were 
defined in [5], where their use as multifactorial designs is also discussed. 

In this paper, some methods of construction of partially balanced arrays are 
considered. One of the methods is applicable when s = 2 and derives partially 
balanced arrays from the well-known A — » — v configurations. The other method 
is an extension of the Bose-Shrikhande [2] method of construction of orthogonal 
arrays. 


2. An example of a partially balanced array. Deleting the first three assemblies 
and the last row from the orthogonal array A(18, 7, 3, 2) given in [1], one gets a 
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partially balanced array of strength two, s = 3 symbols and m = 6 constraints 
in N = 15 assemblies. This array has the A(2 , x2) parameters 
A(a, 2%) = 2 if x; and 22 are unlike, 
= |] otherwise. 


The orthogonal array and the derived partially balanced array are given in 
Tables 1 and 2. The columns of the partially balanced array were a little re- 
arranged. 


TABLE 1 


Orthogonal Array A(18, 7, 3, 2) assemblies 


or 


Constraints 1 
0 
0 


_ 
oo 


6 7 9 10 > 3 
2 2 0 ‘ S.a 
2 2 2 
0 
2 


oo 
So 


2 


0 
1 


_ 
CS 


2 
1 
1 
1 
1 
1 
1 





oO|r or 

| 

Kie woN OF 
Ninorw = 
Nie woof SS te 
Nik CONN OK I 
NinK Oo OK bt 


1 
0 
: 


0 1 


TABLE 2 
Partially balanced array (15, 6, 3, 2) assemblies 


6 13 1 


2 


on 


Constraints 1 
1 0 
2 0 


8 9 10 11 1 
, a 2 


no 
nN or 


1 
0 


2 0 
1 
2 0 
0 
2 


1 


2 
0 


0 
1 


7 
1 
0 
2 
0 
2 
1 


or N eK 
Corre NOW NY 
Noor Core WN 
ow oO = = 


6 


Suppose, an orthogonal array A(N, m, s, t) of index X is resolvable into two 
disjoint arrays. Further, let one of them be a partially balanced array or a 
degenerate partially balanced array (a degenerate array being one which has 
some but not all A(a, 22, +--+ , 2) equal to zero), with A(a,, %2,°+:,%:) <A 
for all t X 1 matrices X. Then the residual array is a partially balanced array 
with \-parameters A(z , 22, °-: , 2%) = X — A(a1,%2,°+- , 2). This provides a 
basis for the deletion process of deriving a partially balanced array from an 
orthogonal array. 


3. Construction of partially balanced arrays for s = 2 from \ — yu — » con- 
figurations. 

Derinition. A \ — yw — »v configuration of m elements is defined [4] as the 
configuration of m elements taken v at a time so that each set of » elements 
shall occur together in just \ of the sets. 

Suppose there are N, sets of v elements each in the configuration. Let N, denote 
the number of sets each containing a fixed subset of t elements. Then it is easily 
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seen that 


(3.1) Ni Yin FED t = 0,1,2, +++ yu. 
pol poet 


Consider the matrix A = ((a,;)) of order m X N, derived froma \ — up — v 
configuration of m elements in N, sets in the following manner: Let a , a2, ---, 


am denote the m elements and s; , 8: , «++ , 8y, denote the N, sets of the configura- 
tion. Then let 


ai; = if a; occurs in the set s; 
= Q otherwise. 


Consider a u-rowed submatrix of A with elements a;; as defined above. Amongst 
the N, columns of the submatrix, a column matrix X,, where its transpose 
Ri = (%,%2,°°*, 2%), % = Oorl,t = 1,°-- , w occurs A(M, 22, °° , Ly) 
times. Specifically, let 7; = 1 fori = 1,---,rand let z; = Ofori = r+ 1, 

- ,#in X. Then it is easy to show that for such an X 


(3.2) A(m1, te, °°*, 2%) = Ne — (" : ‘) New + (" 7 ’) Pic naines 


= (-—1)* A’ 'N, 
where A stands for the symbol of finite difference, 
AN, = Neus — N,. 


Value of \(a2, , 22, --- , 2,4) depends only on the count r of unities in its argument 
and hence it is invariant under permutation of its arguments. 

Now provided A(2; , 22, -*+ , 2s) > O for all s* sets of X, we have 

THEOREM 2.1. The existence of aX — uw — v of melements with X( a , 22, +++ , Ly) 
all positive, implies the existence of a partially balanced array of strength » with 
parameters s = 2 and \(a, 22, -*: , X,) as defined in (3.2). 

Well known examples of \ — yw — » configurations are the triple systems, 
quadruple systems, etc., which are defined in [4]. 


4. An extension of the Bose-Shrikhande method of construction of orthogonal 
arrays and its use in the construction of partially balanced arrays. 
DerFIniTion. A pairwise partially balanced design with parameters 


Ce Per ae wee ee ae oe ee 


is defined as an arrangement of v varieties in blocks of m different sizes k; , ke , 

- , km, there being b; blocks of size k;, >-71 b; = b, satisfying the following 
conditions: 

(i) No block contains a single variety more than once. 

(ii) With respect to any variety, the remaining vy — 1 varieties fall into ¢ 
categories, there being n; varieties in the ith category, called the ith associates 
of the variety; Poses nN, =v—d1. 
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(iii) Two varieties which are ith associates, occur together in A, blocks, 
i= . vow ie 
Then the following relations among the parameters hold, 


m t t 
(4.1) > bikilki — 1) = DS nsevrdvs = v DS mid. 


i=] iss] i=] 
Suppose there exist the orthogonal arrays 
A; (Aki, ai, ki, 2) ee 


of strength two and index \ and in k; symbols. Consider the pairwise partially 
balanced design defined earlier. There are 6; blocks each of size k;. These b; 
blocks provide 6; sets of k; symbols each. Using each set of k; symbols once in 
the orthogonal array A; , one gets b; such orthogonal arrays. If all such orthog- 
onal arrays are arranged side by side, then one gets a matrix A with number 


of columns N = X Dae b,k? and number of rows q = min(q@, q2,°°* , Qm)- 
In the columns of any two-rowed submatrix of matrix A, every ordered pair 
(t. , ty) of two distinct symbols of varieties which are 7th associates will occur 
dA; times and every ordered pair (t; , t;) of two like symbols occur Xr; times, if 
the variety ¢; occurs in r; blocks of the pairwise partially balanced design. Hence 
we have 


THEOREM 4.1. The existence of a pairwise partially balanced design with param- 
Glera te: Ks, Res 5 Bt Ob 5 Obs *** 5 Ons fs Bay Me, *** » Ret Mes Me, ~*~, BO 
and of the orthogonal arrays A,(dki , qi, ki, 2) i = 1, --- , m, imply the existence 


of the partially balanced array of strength two in v symbols and q = min 
(q1, G25 °** 5 Gm) constraints and X(x,, 2) = AA;, Where x, XZ. stand for two 
varieties which are ith associates and \(x, x) = Xr; , and where the variety x occurs 
r; times in the pairwise partially balanced design. 

As an illustration, a partially balanced array which has been constructed 
using the method described above, is given below. This is a partially balanced 
array in vy = 6 symbols, N = 48 assemblies, m = 5 constraints and 


A(a1,%) = 2 if (x, 2) are first associates 
if (2 , x2) are second associates 
if x, and 2, are like, 


where x;,7 = 1, --- , 6 are the variety symbols. In constructing this array, the 
partially balanced design 
(p= Gr = 


in three blocks 


and the orthogonal array A(16, 5, 4, 2) have been used. 
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TABLE 3 
Orthogonal Array A(16, 5, 4, 2) 


R 0 0 
— e 
I, 0 @ 
Lz 0 e 
LI; 0 e 





‘ 
0 = 
and using them on the above array in place of (0, 1, t, ’), one gets three arrays, 
A, , Az and A;. Then the array Ao = [A; Az As] is the desired partially balanced 
array in 6 symbols and 48 assemblies. Let A* denote the array derived from A 
by truncating the first row and the first four columns (as indicated by the hori- 
zontal and vertical lines). Then the arrays Af, Az and A} are obtained from A* 
using the three identifications of variety-symbols given above. Let E denote the 


array 
BM ttt see ot 
1 Xs eee eee s 


XY Xe eee een Xe 

1 Te ee eee Xs 
_ * — * Ss > : : , 
Then the array Ag = [EF A; Az A;] is a partially balanced array in vy = 6 sym- 
bols, N = 42 assemblies, m = 4 constraints and A(z, 7) = 1, A(a;, x;) = 2 if 
x; and x; are first associates and A(x; , x;) = 1 if they are second associates. 


5. Acknowledgment. Thanks are due to Professor R. C. Bose for kindly 
going through the manuscript and for his helpful comments. 
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SOME FURTHER DESIGNS OF TYPE O:PP 


By G. H. FREEMAN 


West African Cocoa Research Institute, Ibadan, Nigeria 


0. Summary. A new method of deriving designs of type O:PP is described. 
The method gives rise to some designs previously obtained by other methods, 
and also to some entirely new designs. These new designs are described in detail 
and a worked example given. 


1. Introduction. Experimental designs with three non-interacting classifica- 
tions are most likely to be of use in two cases. The first is when the trial area 
has two physical configurations, such as rows and columns in a field; the second 
is when a new set of treatments is added to an existing block design and this set 
is, by its nature, unlikely to interact with the previous treatments. In either case, 
orthogonal designs, if available, are the best, but frequently the numbers of 
treatments and other classifications make complete orthogonality impossible. 
The problems of experimental design are the same in both cases, and suitable 
experimental designs using total or partial balance were considered by Hoblyn, 
et al. [4]. In their notation, if an experimental design has three classifications, 
rows, columns and treatments, such that the rows and columns are orthogonal 
to each other and the treatments are partially balanced with respect to both rows 
and columns, then the design is said to be of type O: PP. 

Designs of type O:PP were discussed in more detail by Freeman [1], who 


stated that the only practicable designs are those where the designs of type P 
have two associate-classes only and these two associate-classes are the same for 
each P. It appears, however, that useful O:PP designs with two associate- 
classes can sometimes be derived from two designs of type P without these re- 
strictions. One possibility is that the two designs of type P have two associate- 
classes each but that these are not the same for rows and columns: another is 


that the two designs of type P each have three associate-classes. Since designs 
with two associate-classes can be regarded as special cases of those with three 
associate-classes in which some of the parameters are equal, the first of these 
possibilities can perhaps be regarded as included within the second; nevertheless, 
it is probably better to retain the distinction. 

In order that the resultant O:PP design shall be analysable by the methods 
previously given [1], the two designs of type P have to satisfy certain conditions. 
In the O:PP design let there be n replicates of ¢ treatments on r rows and c 
columns such that each treatment occurs either f or f + 1 times in rows and 
either g or g + 1 times in columns. Let the designs of type P for rows and columns 
have three associate-classes, the same for each classification, with n; members in 
the ith class. Amongst the extra occurrences let ith associates concur \,; times 
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in rows and yu; times in columns, and write v; = rd; + cu;. Then, for the design 
of type O:PP to have only two associate-classes, two of »; , v2 and v3; must be 
equal. If v; = vj # » , where i,j, k are 1, 2,3 in some order, then, in the 0: PP 
design, any one treatment has n; + n; associates of one kind and n, of the other. 
Which of these are called first associates and which second is largely a matter of 
convenience, but the same convention as for partially balanced designs should 
be used when possible. It is also necessary for the parameters of the second 
kind in the O:PP design to satisfy the usual equations for a partially balanced 
design, but these parameters cannot in general be derived directly from those 
of the two separate designs of type P. When the two designs of type P each have 
only two associate-classes, these not being the same, this is represented by 
As = Ai, Ws = Me. 


2. Possible designs. Freeman [3] gave a catalogue of useful designs derived in 
the orthodox manner, “useful” designs being those having more than two 
replicates or treatments, not more than 30 replicates, treatments, rows or columns, 
and not more than 150 plots in all. The method of obtaining designs described 
here does not lead to many useful new designs, while there are some which have 
the same parameters as those derived by the old method. When these are men- 
tioned in this section they are numbered as in Freeman [3]. Thus, there is a 
design with 8 replicates of 18 treatments on 12 rows and columns, m = 1, m2 = 8, 
nz = 8,» = 8, Ae = 6, As = 4, ws = 8, we = 4, ws = 6. Hence »,; = 192, » = 
v; = 120, and so the design has the same parameters as SS VJ 1. 

There are two designs which may be of more use, and one has already been 
used to design a trial in the field. Both use the principle of having singular group- 
divisible designs for each of rows and columns, the resultant O:PP design being 
of Latin square type. All useful Latin square designs so far discovered, whether 
obtained by this method or the orthodox one, have equal numbers of rows and 
columns. They are shown in Table I, which thus repeats some of the information 
given previously [3]. The horizontal line through the middle of Table I separates 
designs derived by the two methods. Above the line », = rAy + cu , v2 = rhe + 
Cue , as usual, while below the line », = rd; + cu, = TA2 + Cue, ve = TAs + Cus, 
that is, ve = rA; + cue. 


TABLE I 
Useful designs in family LL 


As Mi 1 v2 Tr Row Col 


36 24 
30 30 
168 
120 
50 20 
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The most general Latin square design with equal numbers of rows and columns 
has m’ replicates of w’ treatments on mw rows and columns. When the designs 
of type P are singular group-divisible a given treatment in them has w — | first 
associates and w(w — 1) second associates. Further, if fm < w < (f + 1)m, 
1, = m’ — fmw, and so 


i (m? — fmw)(m — fw — 1) 
w—l ' 


do 
in order to satisfy the usual constraint on parameters of the first kind. \, has 
to be integral, and this condition imposes some limitation on the possible values 
of m and w. All singular group-divisible designs are derived from balanced in- 
complete block designs by replacing one treatment of the balanced design by a 
group of treatments, A: in the partially balanced design equalling \ in the balanced 
design; it is easy to construct corresponding O:PP designs, which are very 
numerous. Thus, any balanced incomplete block design with m replicates and 
plots per block and w treatments and blocks gives rise to a design with the re- 
quired parameters. The smallest resultant O:PP design has 4 replicates of 9 
treatments on 6 rows and columns, but this has the same parameters as LL 1. 
The next in this series has 9 replicates of 16 treatments on 12 rows and columns, 
and is shown as LL 4 in Table I. There is a group-divisible O: PP design, SS J 15, 
with 9 replicates of 16 treatments, in 4 groups of 4, on 12 rows and columns, but 
this has all other parameters different. Also, the design shown as LL 3, with the 
same values of m and w, which is derived by orthodox methods, is new, although 
it should have been included in the 1958 paper [3]. 

Another set of designs arises from the balanced incomplete block designs with 
m’ replicates of w treatments on mw blocks of m plots each. The smallest design 
in this series has m = 2, w = 5, and gives rise to LL 5. This is particularly 
noteworthy in that no other O:PP design is possible with these numbers of 
rows, columns, treatments and replicates. It seems at first sight that there should 
be a singular group-divisible O: PP design using the same designs of type P for 
rows and columns, but this design is excluded by Theorem 1 [2]. This same 
Theorem excludes singular group-divisible designs with the same parameters as 
other Latin square designs in this series, for example, those with m = 3, w = 10, 
and m = 3, w = 19. 

When a w X w factorial experiment is to be laid out in rows and columns, a 
Latin square O:PP design derived from two singular designs of type P may be 
particularly suitable. The main effects of the two factors can be associated with 


the rows and the columns, and the corresponding sums of squares in the analysis 


take a fairly simple form. Competing designs for the same numbers of treatments 
and replicates will include lattice squares, but the two designs may well require 
different numbers of rows and columns, which are often pre-determined. Further, 
two error variances have to be calculated in a lattice square, and only one in an 
O:PP design. The example which follows illustrates the method of analysis for 
a general O:PP design, with the modification required for a w X w factorial 
experiment. 
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3. Example. The design LL 5 has been used for a trial on yams (Dioscorea 
sp.) conducted by Mr. E. F. I. Baker, Research Division, Ministry of Agriculture 
and Natural Resources, Western Region of Nigeria. Yams are grown from setts, 
pieces of root tuber with adventitious buds, and the purpose of this trial was to 
find the effect of planting setts of different weights at varying populations per 
acre. A 5 X 5 factorial system was used, the levels of one factor being sett weights 


per acre and of the other factor populations per acre. Representing the 25 treat- 
ment combinations by 


A F K P 
B G L Q 
C H M R 
D I N S 
E J O t 


the rows were lettered a, b, c, d, e and represented sett weights per acre, and 
the columns, numbered 1, 2, 3, 4, 5, were populations per acre. Furthermore, 
the treatments down the leading diagonal, A, G, M, S, Y all had the same indi- 
vidual sett weight, though varying in population and weight per acre. The 
lay-out of the O: PP design in the field, after randomisation of rows and columns, 
was that shown in Table IT. 


TABLE II 


7 y X 

N } S W 

S d E / 

Q 

R 

L 

K 

; O 
O 7 x Fe y L 
M y F ' J 


3 4 5 35 3 23 


It is seen that all sett weights per acre in two populations were arranged in 
each column, these being shown as 1 5, 3 4, etc., while all populations with two 
sett weights per acre occurred in each row, these being d e, ¢ d, etc. First associ- 
ates of any treatment were thus those treatments with either the same population 
or the same sett weight per acre. 

The analysis follows the lines previously given [1], the notation used being the 
same. Thus, in the O:PP design, 


1 ie 3 4 2 Be 2 6 a aii ao ake 29) 
Pij = E 4: Dij = (| a N = n(er — r —c) = 320. 


If treatment, row and column totals are represented by D, B and C respec- 
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tively, G being the grand total, the treatment total for A adjusted for rows and 
columns is P, , where 


P, = 100T, — 10(B; + B; + B; + Bs) — 10(C, + Cs + Cy + Cy) + 4G, ete. 


Further, Ay» = 340, Be = —30, Aw = —180, Bo = 310, A = 100000. Then, 
the treatment parameter for A is 64 = (31P4 + 3 >>P 4.)/10000, where > Pa 
is the sum of the P’s for the first associates of A. 

The treatment sum of squares is >-4P/100, and row and column sums of 
squares are unadjusted. 

The variance of the difference between the means of two treatments that are 
ith associates is obtained by multiplying the error variance by yi , where yi = 


iH, va = BH. 

In this trial the main comparison of importance was between first associates, 
these being particular levels of one factor for a given level of the other. The 
most important comparison for second associates was that among the treatments 
A, G, M, 8, Y with the same individual sett weight. It was also necessary to 
consider the main effects of populations and sett weights per acre. The sum of 
squares for the main effect of populations has the simple form 


> (Pa + Ps + Po + Pp + Pe)*/125000 


minus the correction factor, and similarly for sett weights. The multiplying 
factor for the error variance when comparing the means of any two levels of one 
factor, taken over all levels of the other, is 3x. 
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SUFFICIENCY IN THE UNDOMINATED CASE! 


By D. L. BurRKHOLDER 
University of Illinois 


1. Introduction and summary. In this paper the concept of statistical suffi- 
ciency is studied within a general probability setting. It is not assumed that 
the family of probability measures is dominated. That is, it is not assumed that 
there is a o-finite measure uw such that each probability measure in the family is 
absolutely continuous with respect to wu. In the dominated case, the theory of 
sufficiency has received a thorough-going and elegant treatment by Halmos and 
Savage [6], Bahadur [2], and others. Although many families of probability 
measures of importance for statistical work are dominated, many others are not. 
Nonparametric statistical work, especially, abounds with undominated families. 
It seems appropriate, therefore, to see what can be learned about sufficiency in 
the undominated case. 

Let X be a set, A a o-field of subsets of X, and P a family of probability 
measures p on A. The probability structure (X, A, P) is to be kept in mind 
throughout the paper and is unrestricted except where specifically stated to the 
contrary. Any subfield (= sub-c-field) entering the discussion is implicitly 
assumed to be a subfield of A. If H is a collection of subfields, let V H denote 
the smallest o-field containing each member of H. If A, , As, --- are subfields, 
write A; V A, for V{A,, Ao}, V%-1 A, for V{A;, As, ---}, and so forth. A set 
N is P-null if N is p-null for each p in P, that is, if Nisin Aand p(N) = 0, pe P. 
If f and g are A-measurable functions, write f = g[p] if the set {x | f(x) ¥ g(2)} 
is p-null and write f = g[P] if this set is P-null. Let N be the smallest o-field 
containing the P-null sets. If A; and A, are subfields, write A, C A,[P] if A, Cc 
A, V N, and so forth. A subfield B is sufficient if, for each bounded A-measurable 
function f, there is a B-measurable function g such that {af dp = fxg dp, B cB, 
peP, that is, such that g = E,(f|B)[p], p ¢ P. Equivalent definitions are 
obtained if “‘bounded A-measurable function” is replaced by ‘A-measurable 
characteristic function” or by “‘P-integrable function.” Of course, f is P-in- 
tegrable if f is A-measurable and f x|f| dp is finite for each p in P. A subfield B 
is separable if it contains a countable subcollection such that B is the smallest 
o-field containing the subcollection. 

In Section 2, we give an example of a nonsufficient subfield containing a 
sufficient subfield. This solves a problem posed by Bahadur (Problem 1 on page 
441 of [2]). In fact, we show that often the collection of such nonsufficient sub- 
fields is much larger than the collection of sufficient subfields. Analogous results 
hold for statistics. Some of these and later results depend on Theorem 1 which 
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gives a necessary condition for a subfield to be sufficient in the case that A is 
separable. 

Let A,, A:,--- be a sequence of sufficient subfields. Are the subfields 
A,N A,, A; V As, N%_:A, , and V%_,A, , necessarily sufficient? This question is 
investigated in Sections 3 and 4. Using martingale theory, we show that, if the 
sequence is decreasing (increasing), then 1V%_,A,(V%_:A.) is sufficient. If the 
sequence is not necessarily monotone, it is still possible to show that A, M A, 
and f)\%_,A, are sufficient under a small extra assumption involving N. This 
result rests on a theorem proved in [3] regarding iterates of conditional expecta- 
tion operators. One consequence of this result is of interest in connection with 
the theory of minimal sufficient subfields. It is not necessarily true that A, V A, 
is sufficient. This is shown in Example 4. Conditions under which A; V Az is 
sufficient are examined. 

The main result of Section 5 is related to Theorem 1 and indicates that if A 
is separable then each sufficient subfield is essentially equal to one of a very 
special type. 


2. On a problem of Bahadur. In [2], Bahadur proves that if the family P of 
probability measures on A is dominated, then a subfield of A containing a suffi- 
cient subfield is sufficient, and lists as an unsolved problem the question of 
whether this is true in general. That this is not true in general we now show by 
an example. 

EXAMPLE 1. Let X be the set of real numbers, A the collection of Borel sub- 
sets of X, and P the set of probability measures p on A satisfying p(A) = p(—A) 
for A in A. Here, if S C X then —S is the set {2 | —x ¢ S}. Let Ap = {A\A € A, 
A = —Al}. Clearly, Ao is a subfield of A and if f is a bounded A-measurable 
function then 2g(x) = f(x) + f(—x) defines an Ap-measurable function g 
satisfying {4fdp = Jf4gdp, A ¢ Ay, pe P. Hence, Ao is sufficient. 

Suppose that S is a subset of X satisfying 0 ¢ Sand S = —S. Let 


(1) B = {AU Ay|ACS,A eA, Ace Ag}. 


Clearly, B satisfies Ag C B C A. We now show that B is a o-field. It is obvious 
that the union of a countable family of sets in B is in B. Let B ¢ B. Then there 
are sets A and A,» satisfying B = AU Ay, AC S, AeA, Ape Ao. Let Co = 
(—A)U A and C = Cy — A. Since S = —S, Cy C S and therefore C C S. 
Using primes to denote complements we have that B’ = A’ f A’, = 
(CU C%)NM A’o = (CN A’o) U (C’%9N Ao) which is the union of a subset of S 
in A and a set in Ay. Therefore, B’ ¢ B and B is a o-field. 

Suppose that B is a sufficient subfield. Let f be a bounded A-measurable 
function. Then, since B is sufficient, there is a B-measurable function g satis- 
fying 


(2) [ fap = [ o dp, BeB,peP. 
JB 


“B 
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Let x e S. Then {2} ¢ B and letting B = {2} in (2) gives 
f(x)p(x) = g(x)p(z), peP, 


where we write p(x) for p({z}). Let e eX — S. Then {z, —2z} ¢B but neither 
{x} nor {—2z} belongs to B. Accordingly, g(x) = g(—z), since g is B-measurable. 
Letting B = {x, —2x} in (2) and using the fact that p(x) = p(—<2) gives 


(f(x) + f(—2)\p(x) = 29(x)p(z), peP. 

If x e X, then there is a pe P such that p(x) > 0. Therefore, we have that 
g(x) = f(z) if x e S, 

= $[f(z) + f(—2)) ifeeX —S. 


Let f(x) = —1lifz <0, = 1ifz 2 0. Then f is A-measurable and the function 
g of (3) is B-measurable and satisfies g(x) # Oifx eS, = Oif x e X — S. Thus, 
S = X — g'({0}) isin B. 

Now choose S to be a subset of X satisfying 0¢ S, S = —S, and S ¢A. 
Such a set exists, of course. Then, if B is defined by (1), we see that B cannot be 
sufficient by the result of the above paragraph, for S does not belong to A and 
therefore does not belong to B. 

In summary, a subfield can contain a sufficient subfield and yet not be suffi- 
cient. We now prove several results which indicate that the probability structure 
examined in our example is by no means unusual in this respect. 

THEOREM 1. Suppose that A is separable. If B is a sufficient subfield, then there 
ts a separable sufficient subfield Bo satisfying 


B CBC B VN. 


(3) 


We recall that N is the smallest o-field containing the P-null sets If the only 
P-null set is the empty set, then N = {¢, ¥} C By and Bo V N = By. Accord- 
ingly, the following result is an immediate « onsequence of Theorem 1. 

Coro.uary 1. Suppose that A is separable and the only P-null set is the empty 
set. If B is sufficient, then B is separable. 

Proor oF THEOREM |. Since A is separable, there is a countable field Ao 
such that A is the smallest o-field containing Ay . Let B be a sufficient subfield. 
Then, if A ¢ Ao, there is a B-measurable function g, such that p(A M B) = 
Sega. dp, B eB, pe P. Let Bo be the smallest o-field with respect to which each 
of the functions g, , A ¢ Ao , is measurable. Since Ap is countable, it is clear that 
By is separable. Also, By C B. 

Let A, be the collection such that A ¢ A, if and only if A ¢ A and there is a 
Bo-measurable function g satisfying 


(4) p(AN B) = i g dp, BeB,peP. 
B 


Then Ay C A, C A. Clearly, A; is a monotone class. Accordingly, A; = A since 
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A is the smallest monotone class containing Ay. From the definition of A; and 
the relation Bp C B, we conclude that Bp is sufficient. 

We now show that B C By V N. Suppose that A ¢ B. Then A ¢ A = A, and 
there is a Bo-measurable function g satisfying (4). In particular, 


0 = p(AN(X —A)) = / g dp, pe P, 


xX—A 


p(A) = p(ANA) = | g dp, peP. 
A 


Therefore, if A is the characteristic function of A, we have that g = hA[P], using 
the fact that 0 < g S 1[P], an immediate consequence of (4). Thus, h — g is 
N-measurable, and h = g + (h — g), being the sum of two By VY N-measurable 
functions, is By) VY N-measurable. Consequently, A ¢ By VY N. This completes 
the proof. 

In the following theorem let 


az =N{A|xreA cA}, Qe =Ni{A\|xeA ec Ad, 


where Ag is a sufficient subfield. Let c be the cardinal number of the set of real 
numbers, co the cardinal number of the collection of sufficient subfields, and c; 
the cardinal number of the collection of subfields containing Ay that are not 
sufficient. Since we now know that 0 < ¢, is possible, it will not be too surprising 
to find out that sometimes cy < ¢; . 

THEOREM 2. Suppose that A is separable, Ay is a sufficient subfield, the only 
P-null set ts the empty set, and 


(5) card {ao, | x € X, az ¥ Agr} 
Then, 
want o Se: 
Proor. By Corollary 1, each sufficient subfield must be separable. Therefore, 
(6) co S card {B |B is a separable subfield}. 


Since A is separable, card A < c (see Problem 9 on page 26 of [5]). There is a 
one-to-one function from the set of separable subfields of A to the set of countable 
subcollections of A. If B is a separable subfield, the value at B of this function 
may be, for example, any particular countable subcollection of A such that B 
is the smallest o-field containing the subcollection. Since card A S c, the set of 
countable subcollections of A has cardinal number less than ‘or equal to c. Thus, 
the right hand side of (6) is less than or equal to c, implying that co S c. 

We now show that 2° < c. where c is the cardinal number of the collection of 
subfields containing Ay. Consequently, co < cz, ¢2 = C2 — Co, (i: = C2, and 
7 Se. 

Let S be a subset of X such that X — S is the union of some subcollection of 
{dor | x € X, az # a}. Clearly, the collection of such sets S has cardinal number 
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greater than or equal to 2°, using (5). Let 
(7) B = {AU Ao A 68 S, A eA, Ao € Aj}. 


Since A is separable, if x « X then a, is the intersection of a countable number of 
sets in A and hence is in A. (Note that the partition {a,} of X, induced by A, 
is also the partition induced by any field with the property that A is the smallest 
o-field containing it. Here, since A is separable, a countable field with this prop- 
erty exists.) Since Apo is sufficient, Ap is also separable. Accordingly, do, € Ao , 
x ¢ X. From these facts, it follows that 


(8) S = {x|a,¢B}. 


For if x e S then a, C a, C S and a, is the union of a subset of S in A, itself, 
with a set in Ay, the empty set, and hence is in B. If z is not in 
S then a, ¥ aor C X — S, a, is not in Ap (for otherwise ao, = a,), and a, does 
not have the right form to be a set in B. From (8) it follows that the mapping 
S — B described in (7) is one-to-one. Thus the cardinal number of the collection 
of B’s is greater than or equal to 2°. 

Let B be as in (7). Then Ay C B C A. It remains to show that B is a o-field. 
Let B « B. Then there are sets A and Ao satisfying B = AU Ap, AC S,A eA, 
Ay € Ay. Let Co = Ufag,| xe A} and C = Cy — A. If Co is in Ay then X — B 
is in B and B is a o-field by the same reasoning as in Example 1. We now prove 
that Cy is in Ao . Since Ap is sufficient there is an Ap-measurable function g satis- 
fying 


(9) P(AN ao) = fa,g dp, xeX, peP. 


If x e X then g is constant on do, since g is Ap-measurable and from (9) we have 
that 


P(AN doz) = g(x) p(aoz), peP. 


Since the only P-null set is the empty set, if AM ao, is empty then g(x) = 0 
and if AN a, is nonempty then g(x) > 0. It is clear from the definition of Co 
that if z e Co then AN ao, is nonempty and if x is not in Cy then A/N ao, is empty. 
Thus, X — g '({0}) = Co, implying that Co is in Ap . This completes the proof. 

ReMARK 1. In Example 1, az = {x} and ag = {x, —2}, and it is clear that the 
conditions of Theorem 2 are satisfied. Many probability structures relevant for 
nonparametric statistical work satisfy the conditions, hence the conclusion, of 
Theorem 2. Among these, in addition to the one described in Example 1, the 
following is typical: 

EXAMPLE 2. Let n be an integer > 1, X Euclidean n-space, A the collection 
of Borel subsets of X, and P the set of all probability measures p on A of the 
form p = q X --- X q, where q is a probability measure on the o-field of Borel 
subsets of the real line. If x = (2, --+ , tn) € X, let to(x) be the set of all points 
(ti,,°°*, 2), Where (%,,---, %) is a permutation of (1,---, m). Let Ao 
be the subfield of A induced by the statistic t. That is, Ao is the collection 





1196 D. L. BURKHOLDER 


such that A ¢ Ao if and only if A ¢ A and there is a subset D of the range of to 
such that '(D) = A. Here, ao, = to(x) and a, = {x}, and the assumptions of 
Theorem 2 are satisfied. 

REMARK 2. With reference to Example 1, let t) and ¢t be functions on X satis- 
fying &(2) = |x| ifaeX, tr) = xvifeeS, = |2|\ ifxeX — S. The statistics 
ty and ¢ induce the subfields Ay and B, respectively, of Example 1. Since a sta- 
tistic is sufficient if and only if its induced subfield is sufficient, we have that 
to is sufficient but ¢ need not be sufficient. This is in spite of the fact that t = 
F(t) for some function F. 

Or with reference to Theorem 2, let fo(z) = aur if x e X,t(x) = azifx e S, = doz 
if c eX — S, where S is as described in the proof of Theorem 2. One can proceed 
as in the above paragraph and obtain a similar conclusion. 

REMARK 3. Example 1, Theorem 2, and the above remarks indicate that some- 
times a nonsufficient subfield or statistic can be as ‘‘informative” as a sufficient 
subfield or statistic. Accordingly, the definition of sufficiency in terms of con- 
ditional expectations, like most definitions, does not seem to capture all of the 
intuitive content commonly associated with the concept being defined. Needless 
to say, this, in itself, is not necessarily regrettable. 


3. Sufficiency in the general case. Throughout this section, except in Example 
3, (X, A, P) is any probability structure. Making no further assumptions, we 
now prove several results about the sufficient subfields of A. These results 
are easily shown to be true if P is assumed to be dominated. Without this 
assumption, these results and their proofs become somewhat more interesting. 

THEOREM 3. Suppose that A, , Ao, --- are sufficient subfields. 

(i) Jf A, D A, D --- , then 1 2_,A, is sufficient. 

(ii) Jf A, C A, C --- , then V2_,A, is sufficient. 

Proor. Let f be a bounded A-measurable function. There is, for each n, an 
A,-measurable function g, such that g, = E,(f|A,)|[p], pe P. Let g(x) = 
lim,+2Jn(x) for all x at which the limit exists, = 0 otherwise. 

Suppose that A, D A, D --- . Then g isf1%_,A,-measurable. By the continuity 
theorem for conditional expectations [4, p. 331], limn+.gn = E,(f| N%2A.,)[pl, 
peP. Therefore, g = E,(f| NA, )[p], pe P. Hence, NA, is sufficient. 
The proof of (ii) is similar. 

THeEoreM 4. Jf A; and A, are sufficient subfields and N is contained in at least 
one of these subfields, then the subfield A, 1 Ae is sufficient. 

Proor. Suppose that A; and A: are sufficient subfields and, without loss of 
generality, that N C A, . If n is a positive integer let A»,. = A; and As, = A». 
Let f be a bounded A-measurable function. Define g; , go, --~- inductively as 
follows: Let g; be an A,;-measurable function satisfying g, = E,(f | A:)|p], p « P. 
Tf gn. has been defined, let g, be an A,-measurable function satisfying 
gn = E,(gn-4+\ An)[pl, pe P. Such a sequence g;, go, --- exists because A, , 
A,.,--- are sufficient subfields. Let g(x) = limn+.den1(x) for all x at which 
the limit exists, = 0 otherwise. Let h(x) = lim,..g2n(x) for all x at which the 
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limit exists, = O otherwise. Then g is A;-measurable and h is A»-measurable. 
If p is in P let A,, be the smallest o-field containing A, and the collection of 
p-null sets. Then, by a theorem proved in [3], lim,s.gn = Ep(f | Aip M Aoz)[p], 
p ¢ P, implying that g = E,(f | Ai,/M Ae») [pl], p ¢ P, and that {z | g(x) ¥ h(z)} 
is in N. Thus, since N C A:, we have that g — h is As-measurable. Therefore, 
g = h + (g — h) is the sum of two A,-measurable functions, hence is As-measur- 
able. Since g is measurable with respect to both A; and A, , it is A, A»-meas- 
urable. Moreover, for p ¢ P, 


g = E,(g|Aif Ae) [p] 
= E,(E5(f \AipM Aep)| ArM As)[p] 
= E,(f A, N A,)[p], 


since A,/M A: is contained in A,,M As, . Thus, A; A: is sufficient. 

Corouuary 2. Jf A, , Ao, --+ are sufficient subfields such that N C A, ,n = 1, 
2,--- , then the subfield N%_,A,, is sufficient. 

Proor. Let B, = fj,A; . Then, by induction and Theorem 4, each subfield 
B,, is sufficient. Applying part (i) of Theorem 3 now gives the desired result. 

Consider the following two properties which a sufficient subfield Aj may or 
may not have: 

I. If B is a sufficient subfield satisfying B C A» then Ay C BIP]}. 

Il. If B is a sufficient subfield then Ay C B[P]. 

A sufficient subfield Ap satisfying (II) is sometimes termed a minimal suffi- 
cient subfield. It might be at least as appropriate, however, especially if the 
discussion is restricted to subfields containing N, to use “‘Jeast” or ‘‘smallest’’ 
in place of “minimal” and, instead, apply the adjective “minimal” to any 
sufficient subfield A» satisfying (1). Whether this is true or not hardly matters in 
the light of the following resulv: 

Coro.uary 3. If Ao is a sufficient subfield satisfying (1), then Ao satisfies (II). 

Proor. Suppose that Ao is sufficient and satisfies (1). Let B be sufficient. 
Let A; = B V N. It is easy to see that A; is sufficient. By Theorem 4, AjoN A, 
is sufficient, and, therefore, using (I), Ao C (AoN Ai) VNCA, VN=BVN, 
the desired result. 

Remark 4. The condition involving N in Theorem 4 cannot be eliminated 
entirely as the following example shows: 

EXAMPLE 3. Let X be Euclidean 2-space, A the collection of Borel subsets of 
X, and P the family of all probability measures p on A satisfying p(D) = 1 
where 


izlz = (45%) OX, Xi = aah. 


For 7 = 1, 2, let A; be the subfield of A induced by t; where t;(z) = 2;, 2 ¢X. 
It is easy to check that A; and Ae, are sufficient but that A,M A, = {¢, X} is 
not sufficient. 

Remark 5. The uncountable analogue of Corollary 2 is not true. That is, 





1198 D. L. BURKHOLDER 


there can exist a family H of sufficient subfields, each containing N, such that 
N{B |B «H} is not sufficient. Such an example is given by Pitcher [7]. If no 
such example existed, there would always exist a minimal sufficient subfield, 
contrary to fact [7]. 

Lemna 1. /f B is a sufficient subfield and A belongs to A, then the smallest o-field 
containing BU {A} is sufficient. 

Proor. Let C be the smallest o-field containing B U {A}. Then 


C = {(B,N A)U (BN A’)| B; eB, i = 1, 2} 


where A’ = X — A. For the purposes of this proof, if h is P-integrable let h’ 
denote any B-measurable function satisfying h’ = E,(h|B)[p], pe P. Let 
r = 1 — s be the characteristic function of A. 

We now show that C is sufficient. Let f be an A-measurable function into [0, 1]. 


Let 


g(x) = (rf)'(x)/r'(z) if r(x) #0, 
=@ ff + (xz) = 0, 


and let g2 be defined similarly using s in place of r. Then g; and ge are B-measurable 
and 


g = Thi + 892 


is a C-measurable function. Since 0 < rf S r, we have that 0 < (rf)’ S 7’ [P}, 
rg, = (rf)’ [P|, and 0 S g, S 1 [P]. Similar results hold for s and g, . Let C = 
(BiN A)U (Bf A’) where B; ¢ B, i = 1, 2. Then, for p « P, 


i g dp = / rg dp =| (rgi)’ dp = | rg dp 
B,Na By By By 


i |, cn! ap = | ap = |, fap. 


Similarly, [ gdp = / 
B,Na’ 


Be 


. Sdp, p « P. Hence [s dp = [o dp, pe P. 


The sufficiency of C follows. 

THeoreM 5. If A; is a sufficient subfield and A, is a separable subfield, then 
A, V A, is sufficient. In particular, if B is a separable subfield containing a suffi- 
cient subfield, then B is sufficient. 

Proor. Let A; , As, --- be sets in A such that A, is the smallest o-field con- 
taining | A, , A2, ---}. Let By = A; and define B, , B, , - - - inductively as follows: 
If n is a positive integer and B,_, has been defined, let B, be the smallest o-field 
containing B,_, U {A,}. Using Lemma 1, it follows that each of B, , B:, --- is 
sufficient. Clearly, B; C B, C --- and V7_,B, = A; V A: . Thus, by Theorem 
3, A, V A: is sufficient. The second assertion of the theorem is an immediate 
consequence of the first. 


4. On the smallest subfield containing two sufficient subfields. Let A; and 
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A, be sufficient subfields containing N. Then A,/M A: is sufficient by Theorem 4. 
Is A, V A, also sufficient? It is perhaps somewhat surprising to discover that 
A, V Ae need not be. 

EXAMPLE 4. Let X be the set of all points x = (2; , x2) of Euclidean 2-space 
satisfying |2,;| = |x| and a, ¥ 0. Let n(x) = (m1, — 22), re(x) = (—21, 22), 
Giz = {x, ri(x)}, x eX, and A; be the smallest o-field containing {a;, | x ¢ X}, 
i= 1,2. Let B= A, V A.,D = {x|x2eX, 2, = 2}, and A be the smallest 
o-field containing B U {D}. If x e X, let p, be the probability measure on A 
putting probability } on each of the points x, (a , — 22), (—21 , 22), (—2%1, — 2). 
Finally, let P = {p, | « e X}. The set A is in A; if and only if there is a countable 
set S C X such that U{a;, | x e S} is either A or A’, primes being used to denote 
complements. A subset B of X is in B if and only if B or B’ is countable. Thus, 
if x e X then {x} ¢ B but D is not in B. Clearly, 


A = {(B,N D)U (B/N D’)| By eB, it = 1, 2}. 


Here N = {¢, X}, hence is contained in any subfield. 

Let i = 1 or 2. Then A; is sufficient, as we now show. Let f = fi + fe where 
f; is the characteristic function of BN D, fe is the characteristic function of 
B,N D’, and B, and B, belong to B. Let g: = fi + filri), go = fe + fo(ri), and 
g = (gi + ge2)/2. If B, is countable, then {x | g:(2) # 0} is countable. If By,’ is 
countable, then {z | gi(2) # 1} is countable. Therefore, in either case, since 
9: = gi(ri), g: is Aj-measurable. Similarly, go is Aj-measurable implying that g 
is A;-measurable. If A; ¢ A; and p « P, it is clear that f,,f(ri) dp = Ju,f dp, 
implying that 


fag dp = fa df +f(ri)) dp = Saf ap. 
Therefore, A; is a sufficient subfield. 

However, B = A, V A: is not sufficient. Otherwise, there would exist a B-meas- 
urable function g satisfying p(DNM B) = Jsgdp, BeB, pe P. In particular, 
p:(DN {x}) = Jig dp. , x e X, implying that g is the characteristic function of 
D. This is a contradiction since D is not in B. 

Remark 6. The proof of Theorem 4 was based on a theorem proved in [3] 
which gives a simple way of obtaining the operator £,(- | A, A:) from the 
operators E,(- |A,) and £,(- | A,). That there can be no closely analogous 
result for obtaining E,(- | A, V A:) from E,(- | A,) and E,(- | Az) is implied 
by the above example. 

Of course, certain extra assumptions, in addition to the assumption that A, 
and A, are sufficient, imply that A; V A: is sufficient. One such extra assumption 
is that P be a dominated family of measures. Another is that either A; or A, 
be separable (see Theorem 5). Still another is given in the following theorem. 

TueoremM 6. Suppose that A is separable. If A, and A, are sufficient subfields, 
then A, V A: is sufficient. 

Proor. By Theorem 1, there are separable sufficient subfields B, and B, such 
that 


B,C A;CB; V N, ¢= 1,2. 
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Therefore, B,; V Bz is sufficient, by Theorem 5, and 
B, VB.CA, VA CB, V BV N, 


implying that A; V Ae is sufficient. 

Coro.uary 4. Suppose that A is separable. If A, , A. ,--- are sufficient sub- 
fields, then VA, is sufficient. 

Proor. It follows from Theorem 6 that V/_,A; is sufficient for each positive 
integer n. By Theorem 3, the desired result follows. 


5. Separability and sufficiency. Separability of A or of one of its subfields 
plays an important role in Theorems 1, 5, 6, and elsewhere in the above sections. 
Even so, probably less can be said about sufficiency in the separable case than 
about sufficiency in the dominated case. Whether or not this is true, it should 
be kept in mind that nearly all, if not all, of the probability structures of im- 
portance in statistical work satisfy the condition that A is separable, but many 
do not satisfy the condition that P is dominated. 

As usual, let (X, A, P) be any probability structure. Let D be the collection 
of Borel subsets of the real line. If By is a separable subfield of A then there is 
an A-measurable function f such that f(D) = {f-'(D)|DeD} = Bo. (See 
Lemma 4 of [1], for example. Bahadur’s blanket assumption that X is Euclidean, 
and so forth, is, of course, not needed and not used in his proof of Lemma 4.) 
Therefore, as an immediate consequence of Theorem 1, we have the following: 

THEOREM 7. Suppose that A is separable. If B is a sufficient subfield, then there 
is an A-measurable function f such that 


f(D) = BIP}. 


This should be compared to a result of Bahadur: If A is separable, P is domi- 
nated, and B is a subfield, then there is an A-measurable function f such that 
f(D) = BIP]. (This follows from Lemmas 3 and 4 of [1].) Theorem 7 indicates 
that if one adds the assumption that B is sufficient, then one can drop the as- 
sumption that P is dominated. 


Of course, in Theorem 7 and the above, f could equally well be a measurable 


transformation into any Euclidean space with D again denoting the collection 
of Borel subsets of the space. 
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ON A SPECIAL CLASS OF RECURRENT EVENTS 


By M. P. ScuitttzENBERGER 
Université de Poitiers and University of North Carolina 


I. Introduction. Let F be the set of all finite sequences (words) in the symbols 
x e X. According to W. Feller ({2], Chap. VIII), a recurrent event & is a pair 
(A, uw) where A is a subset of F and yu a probability measure fulfilling the condi- 
tions recalled below; one says that the event & = (A, uw) occurs at the last letter 
x;, of a word f = 2;,x;, --~ x; if and only if f belongs to the set A; we shall call 
A the support of & and denote by T(A, uw) the mean recurrence time of the 
event &. 

If the pair (B, wu’) defines another recurrent event on F, the pair (AN B, uy’) 
defines also a recurrent event. It results from the general theory of Feller ((2], 
Chap. VIII) that, when 7(B, y’) is finite, the ratio 7 = T(B, w’)/T(AN B, yn’) 
is, in a certain sense, the limit of the conditional probability that a random word 
f ¢ F belongs to A when it is known to belong to B. For given arbitrary A, it is 
in general possible to find infinitely many (B, uv’) having finite T(B, yu’) which 
are such that r = 0. 

The main point of this note is to verify several statements which, together, 
imply the following property: 

Property 1. If the support A is such that T(A1/ B, uw’) is finite for every re- 
current event (B, u’) having finite T(B, yu’), then, for every such (B, uy’), x is 
an integer at most equal to a certain finite number 5* which depends only upon A. 

Classical examples of this occurrence are the return to the origin in random 
walks over a finite group [3] and, in particular, the recurreni event which occurs 
at the end of every word whose length is an integral multiple of a particular 
integer. 

In Section II, we discuss some properties of a class of recurrent events which 
we shall call birecurrent; in Section III, we verify the statements mentioned 
above, and in Section IV we describe examples of birecurrent supports. 


II. Preliminary remarks. We consider F as the free monoid ({1], Chap. 1) 
generated by X; the empty word e is the neutral element of F and the product 
ff’ of the words f and f’ is the word f” made up of f followed by f’; f(f’) is called 
a left (right) factor of f”; a word is proper if it is different from e. 

Feller’s condition ({2], Chap. VIII) that the non empty subset A of F is the 
support of a recurrent event can be expressed as follows: U,:ifaeA andf « F, 
then, af ¢ A if and only if f ¢ A. This condition implies that A is a submonoid of 
F (i.e., that e e A and A’ C A). We shall say that A is birecurrent if it satisfies 
U, and the symmetric condition U;, U;: if a ¢ A andf ¢ F, then, fa ¢ A if and 
only if f ¢ A. 

It follows immediately that, if {A,} is any collection of supports of recurrent 
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(birecurrent) events, the same is true of the intersection C of the sets A; ; indeed, 
C is a submonoid because every A; is a submonoid and, if, e.g., a, af ¢ C, the 
word f belongs to all the sets A; (because of U,) and consequently it belongs 
also to C. 

Throughout this paper, A will denote a recurrent (or, eventually, birecurrent) 

support and we shall use the following notations: 

A* = the set of all the proper words at the end of which the event whose sup- 
port is A occurs for the first time; for any recurrent support B, B* is 
defined similarly. 

S = F — A*F (= the complement in F of the right ideal A*F); 
R = F — FA*. 
We state explicitly the following well known facts: 


II.1. Every f ¢ F admits one and only one factorization f = as with a ¢ A and 
s e S and at least one factorization f = ra’ with a’ ¢ A andr « R. If and only 
if A is birecurrent the second factorization is unique for all f e F. 


II.1’._ Every proper a of A admits a unique factorization as a product of ele- 
ments of A*. 

The two statements are quite intuitive but a formal proof of them has been 
given in ({5]); II.1’ shows that any bijection (i.e., one to one mapping onto) of 
A* onto a set Y can be extended to an isomorphism of A onto the free monoid 
generated by Y. 

The following remark will be used repeatedly in the course of this paper: 


11.1”. When A is birecurrent, if s, s’ ¢ S (r, r’ e R) are such that s is a right 
factor of s’ (r is a left factor of r’) and that sf, s’f ¢ A (fr, fr’ « A) for some 
feF,thens = s’ (r = r’). If, furthermore, f ¢ R (f ¢ S), then sf ¢ A* U {fe}. 

Proor. Because of the perfect symmetry of U, and U; we can limit ourselves 
to the proof of the statement concerning s and s’. By hypothesis, s’ = f’s for 
some f’ ¢ S and sf, f’sf ¢ A; because of U, , this implies f’ ¢ A. Because of s’ e S = 
F — A*F and II.1’, this, in turn, implies f’ = e, and we have proved that s’ = 
es = s. Let us assume now that sr ¢ A with s ¢ S andr ¢ R. If, in addition, 

= e, the result is proved. If sr ¢ A — {e}, I1.1’ shows that sr = aa’ with 
ae A* anda’ ¢ A; as above, a cannot be a left factor of s and, consequently, a’ 
is a right factor of r; but, by a symmetrical argument, this shows that a’ = e and 
that consequently sr = a ¢ A*. This concludes the proof of II.1”. 

Let us assume now that A is birecurrent; we denote by ASf(ARf) the set of 
the right (left) factors of f that belong to S(R) and by Af the set of the triples 
(r, a, 8) such that f = ras and that r e R, a ¢ A, s € S; such a triple will be 
called an A-factorization of f and éf will denote the number of distinct triples in 
the set of the A-factorizations of f. 


I1.2. For any f, f’ ¢ F, off’ = max (df, 5f’) and off’ = df (= df’) if and only if 
for every left (right) factor f” of f’ (of f) the product ff” (f’f’) has a factoriza- 
tion ff” = sa (f’f’ = ar’) where a ¢ A and where f” is a right (left) factor of a. 
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Proor. Let us consider any element g ¢ F and prove that there exists a bijec- 
tion o, : ARg — ASg. Indeed, by II.1, to any r ¢ ARg (i.e., to any r e R which 
is such that g = rg’ for some g’ ¢ F) there corresponds a unique s e ASg (de- 
termined by the conditions g’ = as, a ¢ A, s e S) which we call o,r; because of 
the symmetry implied by the hypothesis that A is birecurrent we can construct 
in a similar manner a mapping ASg — ARg which we call o>’. Since, clearly, for 
any r ¢ ARg we have (0, © o,)r = r and similarly for any s ¢ Sg, this shows 
that o, is a bijection and also that the A-factorizations of g are in a one-to-one 
correspondence with the elements of ARg. 

We now revert to the proof of I1.2. By the above construction we know that 
ff’ is equal to éf (i.e., to the number of elements in ARf) plus the number of 
proper 7’ ¢ ARf’ such that fr’ « R. Thus, 6ff’ = df with the equality sign if and 
only if we do not have ff” « R — ARf for some left factor f” of f’, i.e., if and 
only if every such ff” satisfies the condition stated in II.2. Because of the sym- 
metry this concludes the proof. 

For any f ¢ F, let us denote by af the smallest positive integer for which 
f” « A; ef is infinite if the only finite power of f that belongs to A is f° (= e, 
by definition). 


II.3. A sufficient condition that the recurrent support A is birecurrent is that 
af is finite for all f « F; reciprocally if A is a birecurrent support, then, for any 
f ¢ F, af is at most equal to the supremum 4’f of 5f” over all the positive powers 
of f 

Proor. By hypothesis, A satisfies U, and, in order to show that it is birecur- 
rent, it will be enough to show that if a and fa belong to A then f also belongs 
to A. Let us assume that (af)” ¢ A for some positive finite m; we have (af)” = 
a(fa)”“f ¢ A and, because of the fact that a, (fa)”™” ¢ A and U,, this implies 
f ¢ A. This proves the first part of I1.3. 

Now let A be birecurrent and f such that 4’f is finite; by I1.1, any 
f"(0 S n S &’f) admits an A-factorization (e, a, , s,) and, by II.2, to each such 
s, there corresponds one A-factorization of 7%; Since, by definition, af? s Of, 
we must have s, = 8m (= 8, say) withO S m,n S &'f and, e.g., m < n. Thus, 
f” = asand f” = a’s witha, a’ ¢ A and, after cancelling s, we obtain f” "a’ = a. 
Because of U,, this last relation shows that f” ” belongs to A and, since 0 < 
n — m & 4’f, by construction, the result is entirely proved. 

Let us assume now that A is birecurrent and that f is such that 6f = 6f? < ~. 
We consider the set K (containing at least f’) defined by K = {f’ e fFf:6f’ = 4f}. 


11.4. There exists a group G, a subgroup H of G and a mapping o:K — G 
that have the following properties: o is an epimorphism (i.e., homomorphism 
onto) and G is finite; * ‘H = Kf A and the index of H in G is at most §f. 

Proor. According to II.2, the hypothesis 6f = 4f* implies the existence of a 
bijection o*: ASf — ARf defined for each s ¢ ASf by o*s, the unique r ¢ ARf 
which is such that sr ¢ A; trivially, o*e = e. Also, by II.2 and the very definition 
of K, we have ARk = ARf and ASk = ASf for any k ¢ K; consequently, K’ c K. 
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Thus, recalling the definition of o; given in the proof of I1.2, we can associate 
to any k ¢ K a bijection of : ARf = ARf defined by of = o* ox. 

Let us now verify that for any k, k’ « K we have ox = o4 © o7. Indeed, if 
(r, a, s) ¢€ Ak and (r’, a’, s’) ¢ Ak’ we shall have (r, a”, s’) ¢ Akk’ for some a” ¢ A 
if and only if sr’ ¢« A and the identity is verified. Because of the hypothesis that 
éf is finite, this construction shows that the set {oz} (k e K) is a group G and 
that the mapping o which sends every k ¢ K onto of is an epimorphism. 

Observe now that k belongs to A if and only if (e, k, e) e Ak, that is, if and 
only if of keeps e invariant. Again, because G is finite, the elements k ¢ K which 
have this last property map onto a subgroup H of G and, clearly, o ‘H is con- 
tained in A. The fact that the index of H in G is at most equal to the number 
of elements in ARf (i.e., to the number 4f) is a standard result from group 
theory. As a corollary of I1.4 we state I1.4’. 


II.4’. If A is such that the supremum 6* of 6f’ over all f’ ¢ F is finite and if 
éf = 6*, then the representation {o;} described in II.4 is isomorphic to the rep- 
sentation of G over the cosets of H. 

Proor. Thé property stated amounts to the statement that the group 
G = {oz} is transitive or, in an equivalent fashion, to the fact that for every 
s e ASf there exists at least one k ¢ K such that oe = s, i.e., such that k= as 
with ae A. 

In order to prove this, let (r, a’, s) « Af. By I1.3 we know that there exist 
finite positive integers m and m’ such that f" ¢ A andr” ¢ A. Thus the product 
ff" f = f"f" as admits the factorization a”s with a” = f"f" a’ ¢ A and it 
belongs to K since, under the hypothesis that 6f is maximal, K is identical to fFf. 

The next statement is not needed for the verification of property 1. Its aim 
is to show that the representation described in Section IV below covers all the 
birecurrent supports with finite 6* = sup @f. 


I1.5. If A is a birecurrent support with finite 6* there exists a monoid M and 
an epimorphism (homomorphism onto) y:F — M such that y “yA = A, and 
that M admits minimal ideals. 

Proor. Let us consider any f ¢ F and denote by {yf} the set of all f’ ¢ F which 
satisfy the following condition: for any f; , fo ¢ F, fiffe ¢ A if and only if fif’fe ¢ A. 
The relation f’ ¢ {yf} is reflexive and transitive and it is well known that it is 
compatible with the multiplicative structure of F (i.e., it is a congruence ); thus 
we can identify each set {yf} with an element yf of a certain quotient monoid 
M of F. Since f ¢ A if and only if fif fe ¢ A with f; = fe = e, A is the union of 
the sets {ya} (a ¢ A) and, trivially, y yA = A. 

Let us now take an element f such that 6f = 6*, a finite quantity; according 
to I].2, the maximal character of df implies that for every f; the product fif has 
a left factor fir ¢ A for some r ¢ ARf. Thus, because of the symmetry, any rela- 
tion fif fe ¢ A implies fir, sf ¢ A with (r, a, s) & Af. 

It follows immediately that for any two k, k’ e K(= fFf), the relation yk = 
yk’ is equivalent to the relation ck = ck’ in the notations of II.4. Thus, cK is 
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isomorphic to a group and since K is the intersection of a right and of a left ideal 
of F, this shows that M admits minimal ideals. 

We now revert to the preparation of the proof of the main property and we 
consider A, a birecurrent support, B a recurrent support and C = Af) B; we 
assume that C does not reduce to {e} and that consequently C* (the set of the 
proper words at the end of which the events whose supports are A and B respec- 
tively occur together for the first time) is not empty. 


II.6. Any element f from F — C*F has a unique factorization f = fife with 
fi e B — C*B and f. ¢ F — B*F; conversely any such product fife belongs to 
F — C*F. 

Proor. Because of II.1 any f has a unique factorization f = fife with f; « B 
and fe ¢ F — B*F. Since C is a revurrent support contained in B, any product 
fifs with f; « B and f; « F — B*F belongs to F — C*F if and only if f; belongs 
to B — C*B and this concludes the proof. 

As mentioned in II.1’, there exists an isomorphism 8:B — Q where Q is the 
free monoid generated by Q* = 8B* and it is easily verified that the image P of 
C by 6 satisfies U, and U, when, according to our hypothesis, A is birecurrent. 
Indeed, P is surely a submonoid of Q and it is enough to verify that the relations 
p, p', pgp’ € P imply q ¢ Q (because 8 ‘p, Bp’, B ‘pgp’ ¢ A imply, e.g., 8 ‘gp ¢ A, 
by U,, then 6"'g ¢ A, by U, and, finally g e P = B(AN B)). 

As before, we define a P-factorization of an element q¢ ¢ Q as a triple (7, p, 3) 
such that ¢ = 7p3 and that 7 e R = Q — QP*, pe P, se S8 = Q — P*Q with 


P* = BC*. All the remarks made in II.2 apply here since P is a birecurrent sup- 
port in Q, and we define 6g as the number of P-factorizations of q. 


II.7. For any b ¢ B, 58b S 6b. 

Proor. Let 7 be any element of R and define 8*7 as the (uniquely determined ) 
element r ¢ R such that (r, a, e) ¢ Ab for some a ¢ A. We show that the restriction 
of the mapping 6* to any set ARgq (q ¢ Q) is an injection (i.e., is one to one into). 
Indeed, if 7, 7” ¢ ARq we have, e.g., 7 = 79’ for some q’ ¢ Q; thus, if 6*7 = B*? 
(= r, say), we have the following relations: 8 ‘7 = ra ¢e B witha ¢ A; BF’ = 
ra’ ¢ B with a’ ¢ A; ra’ = rab’ with b’ = 8''g8 ¢ B. Consequently, a’ = ab’ 
and, because of U,, b’ ¢ A. This shows that g’ = 8b’ belongs to P and that 
finally, g’ = e because of the relation # = 7q’ ¢ R. Thus, # = 7 and our con- 
tention is proved. 

The remark II.7 is also proved since we have shown that for any b ¢ B there 
exists an injection of ARBb into ARb. 


II.8. If 6* (= sup §f) is finite and if 6b = 6* for at least one b ¢ B, then 5* 
(= sup 6q) is a divisor of 4*. 

Proor. Under these hypotheses, we may assume without loss of generality that 
B contains an element f such that 6f = 6* and 68f = 4*. We use the notations 
of II.4 and II.4’. By construction, the image G’ by o of BN K is a subgroup of 
G and we have BN o ‘(HN G’) = AN BN K. Thus, by a standard result of 
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group theory the index 6’* of HN G’ in G’ is a divisor of the index of H in G 
(i.e., of 6*). We prove now that 6’* is in fact equal to 6*; for this we repeat the 
construction of I1.4 and II.4’ with 86(BN K) in the role of K and we obtain an 
epimorphism ¢: 8(BN K) — G such that 6* is the index of the subgroup H of G. 
We recall the definition of the mapping 6* used in II.7 and we observe that we 
can define a bijection 6*': ARSN B*ARBf — ARsf such that p*" o B* is the 
identity mapping of AR@f onto itself; 6** induces in a natural fashion an epi- 
morphism $**: G’ — G and, trivially, HM G’ is the inverse image of A by p**. 
Thus 65* is equal to 6’* and II.8 is proved. 


III. Verification of property 1. We keep the notations already introduced and 
we assume that (A, yw) is a recurrent event. According to Feller, u satisfies the 
two conditions: 

Mo: we = 1 and for any f ¢ F, uf = oF (ufz:2€X), 

M,:ifaeAandf ¢F then paf = papf. 

We shall say that u is a positive product measure if uff’ = ufuf’ > 0 for any f, 
f’ ¢ F, and, in this case, M, is trivially satisfied. 

We denote by |f| the length of the element f and for any subset F’ of F we 
use the following notations: F, = {f ¢ F’: f| S n}; uF’ = lim, ie luf:f € F;}. 
It follows that uF’ < 1 if F’ is such that any f ¢ F has at most one left factor 
which belongs to F’; this condition is satisfied in particular by any subset of A* 
and, according to Feller’s definition, we shall say that (A, u) is persistent if and 
only if «A* = 1. The next two statements are verified by an imitation of 
Feller’s proof procedure. 


III.1. For any recurrent event (A, u) we have T(A, uw) = pS. 
Proor. Let us introduce for any s ¢ S the notation S(s) = SN sF. We verify 
the identities 


(111.1). forall m = |s|:0 < ws — wAsai(s) = wSmail(s) — uSn(s); 


(III.1’). forall m = 1:(1 — wA*) + (wA* — pAX) = wSm — uSm—1 


Indeed, (III.1) is an immediate consequence of Mo and of the fact that the sets 
fs} U Sn(s)X and Snyi(s) U AX4;(s) are identical for any m = |s|. (III.1’) is 
the special case of (III.1) for s = e. 

From this second identity we deduce that if »A* = 1 we have 
lim m+ (“Sm — wSm+) = 0. Thus, a fortiori (from the first identity) .A* = 
implies us = »A*(s). We now sum the second identity from m = 1 tom = n. 
After rearranging terms, we obtain: 


(I1.1”). wS, = (n+ 1)(1 — pA*) + D&S {lal ua: a & A’. 


This shows that if (A, u) is not persistent, uS is infinite and we assume now that 
pA* = 1. Under this hypothesis, 7'(A, uw) is defined as lim,.« + ts ‘la| wa:ae A}, 
and since ».A* = 1 implies that 


(n+ 1)(1 — pA*) = z. {(n + 1)ya:a ¢ A* — A*}, 
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we can write for all n 
> {lal wa:ae At} SuS, S Dd {lal wa:ae A* — A*} + > {lal wa:ae A’). 


This concludes the proof since it shows that uS = T(A, yz) if this last quantity 
is finite and that uS is infinite if TA, ) is so. 

For any s € S let us define R*(s) as {e} if s = e and, as the set of those f ¢« F 
such that sf ¢ A*, if s ¥ e. 


III.2. If A is birecurrent, u a product measure and (A, yw) persistent, we have 
T(A, ») = wR and, for alls ¢ S,1 = wR*(s). 

Proor. Under these hypotheses all the notions are perfectly symmetrical. 
Thus, the identity (III.1”) shows that uR, = wS, and, as a special case, that 
uR = T(A, uw). Since any a ¢ A*(s) has a uique factorization a = sf with 
f « R*(s), and since yz is a product measure, we have for all m 2 |s| the identity 
(III.2) pAx(s) = wsuRe_),)(s). 


Thus, we have in any case uR(s) = pA*(s)/us S 1 because of the formula 


(III.1); with the equality sign when (A, ») is persistent because as seen above 
us = pA*(s). 


III.3. If A is birecurrent and uz a product measure, 7(A, yu) = 6*. 

Proor. We use the notations of Section II and we recall the following facts: 

(1) According to II.1”, R*(s) is a subset of R; 

(2) for the same reason, if s, s’ e ASf for some f ¢ F, the sets R*(s) and R*(s’) 
are disjoint. 

(3) if 6* is finite and 6f = 6* then, by II.2, to every r ¢ R there corresponds 
one s ¢ ASf such that sr ¢ A*. Thus, in this case, the union of the sets R*(s) 
over all s ¢ ASf is equal to R. Now to the proof! We shall show that if 6f = 6* 
we have the inequalities uR < 6f S uR and, trivially, the result will follow by 
ITT.2. 


The second inequality is vacuously true when (A, yu) is not persistent since, 
then, uF is infinite. When (A, yu) is persistent we have for any f’ ¢ F the in- 
equality 6f’ = >» {uR*(s): s e ASf'} S wR since, then, ~wR*(s) = 1 and since 
the sets R*(s) are pairwise disjoint. Thus the second inequality is always true. 
If now éf = 8*, we know by 3 above that >> {uR*(s): s ¢ ASf} = 
any case, as we have seen in the proof of III.2, we have uR*(s) < 1, it follows 
that uR < 6* and the result is proved. 


pR. Since in 


111.4. If (B’, w) is a recurrent event and if A is birecurrent we have 


T(AN B’,u) = &T(B’, pw) 
where 6* is defined below. 


Proor. Let B = {b ¢ B’: ub > 0} and C = AN B; it is easily verified that 
(B, uw) is again a recurrent event and that, according to III.1. we have 


T(AN B’,u) = TIAN B,u) = wl(F — C*F) 
T(B’, u) = T(B, p) = u(F — BP). 
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We keep the notations used in the proofs of [1.6 and II.7 and we observe that, 
by taking into account II.6 and the condition M, on yw, the remark III.4 is 
equivalent to the relation 4(B — C*B) = 6*. In order to prove this identity we 
define a measure v on Q by the relation v@b = wb, for all b ¢ B; because of M, 
and of the definition of B, v is a positive product measure and, since we know 
that P = BC is birecurrent, (P, v) is a recurrent event on Q. Because of III.1 
and III.3 T(P, v) = v(Q — P*Q) = 6. But, by definition, »(Q — P*Q) = 
vB(B — C*B) = u(B — C*B) and the result is proved. 


III.5. If 4* is finite, and (B’, u) persistent for some measure yu which satisfies 
the condition that for every f « F at least one element from FfF has positive 
measure, then 6* is a divisor of 6*. 

Proor. Because of the conditions satisfied by u and 6* we can find an element 
f such that 6f = 6* and that uf > 0; we have f = b's’ with WD’ ¢ B 
and s’ « F — B*F. Because (B, yu) is persistent, it follows from III.1 
that u(B* — s’F) = us’. Since this last quantity is positive, there exists at least 
one element b ¢ B*/N s’F. Finally, because of II.2 we have 6b’b = 6* with b’b ¢ B. 
Thus, we can apply II.8 and the result is proved. 

The next statement is intended to give a characterization of the birecurrent 
supports in terms of their intersection with other recurrent events; by E we mean 
any fixed birecurrent support such that 7'(F, ) is finite for some positive prod- 
uct measure u; H* is defined as usual and we say that (’, yw’) belongs to the 
family ((£)) if the two following conditions are met: 

(i). (2’, vw’) is a recurrent event on F; 

(ii). there exists a finite integer m such that any element from E’* is the 
product of m words from £*. It is trivial that under these hypotheses E’ is bire- 
current. Since F itself is a birecurrent support (with F* = X) a simple example 
ce” a family ((£)) is the family of the birecurrent events (F(m) , um) Where Fm) 
is the set of all words whose length is a multiple of m and where u,, is a suitable 
measure. 


III.6. If the recurrent support A is such that (AN £’, uw’) is persistent for every 
(E’, uw’) « ((E)), then, A is a birecurrent support. 

Proor. This is a simple application of II.3 and we use the notations of this 
remark. If af is finite for all f, then we know by II.3 that A is birecurrent. Thus 
we may suppose that A and f are such that af is infinite and we show that 
(AN E’, uw’) is not persistent for some suitable (Z’, u’). Indeed, by the second 
part of II.3 we know that f” ¢ EF for some finite positive m. Thus f” admits a 
factorization as a product of m’ elements from E*. We take E’ defined by the 
condition E’* = E*” and y’ defined by the condition that y’f” = 1 and y’f’ = 0 
for any other f’ « E’*. The conditions My and M, recalled at the beginning of 
this section are obviously satisfied and T(£’, uv’) is finite. Finally, (AN 2’, uy’) 
‘annot be persistent since AM E’ reduces to {e} and this ends the proof. 

Clearly, the conditions of III.6 are satisfied if A is such that T(/AN B, yw) < « 
for any (B, uw) with finite T(B, yw). 

The next statement is a simple application of I1.2. 
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III.7. If A is birecurrent and if 6* is finite, then, for any product measure, yu, 
the distribution of the recurrence time of (A, ») has moments of every order. 

Proor. Let A’ = {a ¢ A: ya > O}. Trivially, A’ is birecurrent and, by II.7 we 
know that every f ¢ F has at most 6* A’-factorizations. Since the distribution of 
the recurrence times of (A, uw) and (A’, uw) are the same, there is no loss of gen- 
erality in assuming that A = A’, i.e., that yu is positive. 

Since 6* is finite there exists an element f ¢ F which, because of II.2, has the 
property that for any proper s ¢ S the product sf has a factorization sf = ar 
with a ¢ A*A. Thus, for any integer n, the definition S = F — A*F allows us 
to write the inequality 


LVuf’:f’ ¢ A*, nif| < |f'| S (n+ 1)|f\} = wA*manlf| -— pA* fisa- uf)"*. 


Consequently the distribution of the |a| for a ¢ A*, i.e., of the recurrence time 
of A*, is dominated by an exponential distribution and this proves the result. 


IV. Examples. We want to describe a class of monoids, V, which allows the 
construction of birecurrent supports. For this purpose, we consider a group G’ 
(whose elements are identified with the corresponding elements of its Frobenius 
algebra) and a subgroup H’ which contains no proper normal subgroup of G’; 
I = {7} and J = {j} are two sets of indices and w isa J X J matrix with entries 
w;; in H’. Without loss of generality we can assume that there exists no pair of 
indices j, 7’ ¢€ J (i, @# e€ I) and no element h ¢ H’ such that wih = 
wi; (hwi; = wy ;) identically for alli e J (j e J). 

We shall denote by V the set of all J X J matrices v with entries in G’ U {0} 
that have the following property: for each j ¢ J there exists an index 7’ ¢ J and 
an element g;;- ¢ G’ which are such that the product vw.; (with w.; = the jth 
column vector of w) is equal to w.;-g;;- (i.e., to the vector whose ith entry is 
equal to w;;-g;;). Trivially, this condition implies that v has one and only one 
non zero entry in each line; it also implies the existence of an isomorphism 
v — 6 which sends V onto the monoid V of the J * J matrices defined by the 
symmetric condition and which is such that vw = wi, identically; V is a monoid 
and it contains as minimal ideal the set Vo of all matrices whose ith column 
vector is equal to w.,g (with any i e J, 7 ¢ J, g ¢ G’) and whose 7’th column 
vector is zero for i’ # 7. 

IV.1. The subset L C V of the matrices of V which have at least one entry 
in H’ satisfies U, and U,. 

Proor. L is not empty since it contains at least the neutral element of V. Let 
us assume that v ¢ L and that v,; ¢ H’. Because of the hypothesis that all the 
entries of w belong to H’, the ith coordinate of vw.; for any j ¢ J, (that is, v;;-w,-;) 
belongs to H’. Thus, vw.; = w.;-h for some 7’ ¢ J and h ¢ H;; it follows that all 
the non zero entries of v belong to H’. This shows that ZL is a monoid and, trivi- 
ally, that it satisfies U, and U,. 


IV.1’. If F is a free monoid and 7’: F — V an homomorphism, then the subset 
A = 7 (LN 7’F) is a birecurrent support and the corresponding parameter, 
5*, is at most equal to the index of H’ in G’. 
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Proor. The first part of the statement does not need a proof; we verify the 
second part by showing that for any f ¢ F (with the notations of II.2) there exists 
an injection of ARf into the set of the left H’-cosets. Let r, r’ « ARf with, e.g. 
r’ = rf’; for any 7 ¢ I, the condition (y’r);:;, # O defines in a unique manner 
’ eITandg = (7’r) € G’. In a similar way, we define 7” ¢ J and g’ ¢ G’ by 
the condition 0 # (7’f’) vin (= g’). Since (7’r’) ive = (y'rf’) ise = gg’ we see that 
g and g’ belong to the same H’-coset if and only if g ¢ H’, that is, if and only 
if f’ ¢ A, that is, finally, if and only if r = r’ and this ends the proof. 

Reciprocally, if A is a birecurrent support with finite 6* we can take (with the 
notations of 11.5) G’ = Gand H’ = H and find, J, J and w such that yF = M 
is a submonoid of V. Then Vo C 7F and a sufficient condition that yf ¢ Vo is 
éf = 65*. We shall not prove these results here since they are a straightforward 
application of Clifford’s theory [4]. 


IV.1”. If 5* is finite and if for each f ¢ F there exists a finite positive m such 
that yf" « Vo, then the parameter 6* defined in II.7 is always a divisor of 6*. 

Proor. We consider the group G’ defined in II.8. According to the general 
theory of monoids [4] the only groups contained in yF under the hypothesis of 
IV.1” are in fact contained in Vp». Consequently, they are isomorphic to sub- 
groups of G and this concludes the proof. 


IV.2. If A is a birecurrent support such that A* is a finite set then either there 
exists an s ¢ S for which s’ NM A = ¢ (and then (A, y) is not persistent for any 
positive product measure ») or else, the conditions of IV.1” are satisfied by A. 
In this second case, yF is a group if and only if 4* reduces to the set of all the 
words having some fixed finite length. [5]. 

Proor. We assume that A* is finite and that AN sF = ¢ for all s ¢ S; then, 
by the very definition of y the monoid yF is finite. By II.2 we see that if r, 
r’ ¢ ARf for some f e F, then the equation yr = yr’ implies r = r’. Thus, the 
parameter 6* is finite. Let us take any element f ¢ F; the hypothesis that 6f < 6* 
implies that for some pair (f’, f”) one has fff” « A*. Thus for all f ¢ F, 6f" = 4* 
for large enough m since, otherwise, A* would not be finite. This proves that A 
satisfies the conditions of IV.1”. 

We now make the supplementary assumption that yF is a group G with 
yA = H, and we consider a, an element of maximal length of A*. If ja} = 1 
the result is vacuously true since, then, A = F. If ja} 2 2 we write a = sza’ 
with x, x’ e X. Because of U,, no'left factor of a belongs to A* and because of 
the maximality of |a|, we have sxx” ¢ A for all x” ¢ X. Thus, all the generators 
of F belong to the same left H-coset. For this reason, we cannot have sx” ¢ A* 
for any x” ¢ X and, because again of the maximal character of |a| this implies 
that sx”2’” ¢ A* for any two 2”, x’” e X. Thus, for any two elements 2, x’ ¢ X, 
the left coset xz’H does not depend upon the choice of x and 2’. If |a| = 2, this 
proves the result. If |a| = 3 we can write s = s’y with y e X and by the same 
argument we prove that for any z, x’, x” ¢« X the coset rz’x”H does not depend 
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upon the choice of these three elements. Since |a| is finite, by hypothesis, a simple 
induction gives the result. 

The next statements discuss the existence of birecurrent supports with finite 
5*. Without loss of generality, we shall assume from now on that X contains a 
finite number 22 of elements. 


IV.3. For any finite n 2 3 there exist infinitely many different birecurrent 
supports with this value of 6*. 

Proor. In the next section we shall show the existence of at least one birecur- 
rent support with 6* = 2 and A* infinite. In this section we show that to every 
birecurrent support A and element u ¢ A* we can associate one other birecurrent 
support B with 65 = 6* + 1 and B* infinite and that, for the same A* and 
different choice of u ¢ A*, the two corresponding new supports are different. 
Thus IV.3. will be entirely proved with the help of IV.4.. Let us now take u ¢ A*, 
a fixed element, and define: J = (uF N Fu) — {uj}; J* = J — J’ (ie., = the 
subset of those elements of J that cannot be written as the product of two ele- 
ments of J). With the help of II.1”, it is easily verified that there exists a bire- 
current support B which is such that B* = J* U (A* — {u}) and we prove that 
for all f ¢ F the number (say, 5(B, f)) of its B-factorizations is at most equal to 
5f + 1. In order to do this, we slightly extend the notations of II.2, and for any 
subset F’ of F we say that the triple (f”, f’, f’’) is a F’-factorization of f if f’ e F’ 
and f” ff’ = f; also, we denote by 6(F’, f) the number of distinct F’-factoriza- 
tions of f and we observe that by induction on the length of f, the result of II.3 
can be summarized by the identity |f| + 1 = 6(A, f) + 6(A*,f). 

Here, we have 


5(A*, f) = 6(A* — ful, f) + S({u}, f), 
5(B*, f) = 5(A* — {uj}, f) + 6(J*, f), 


We want to show that 6(B*, f) = 6(A*,f) + 1. If 6({u}, f) = 0 or 1, we have 
5(J*, f) = O and the result is proved; consequently, we assume now 
that 6({u},f) 2 2 and we consider two { u}-factorizations (fi , u, fi ) and (fe, u, fo) 
with, e.g. |fi| < | fol. The element w determined by the equation f = fiwf2 belongs 
to J; it belongs to J* if and only if there is no {u}-factorization (f;, u, f3) for 
which |f;) < |fs| < |fe|; it follows instantly that 6(J*, f) = d({u}, f) — 1 and 
the result is proved. 


IV.4. For each finite n 2 3 there exist at least two different birecurrent sup- 
ports with A* finite and 6* = n. 

Proor. One of these supports has been described in IV.2; in order to produce 
the other one, we take a birecurrent support A, a fixed element u ¢ (F — A*F)N 
(F — FA*) and we construct another birecurrent support B with 65 = 6*; in 
the last part of the proof we verify that by a proper choice of u and A* we can 
make B* finite. 
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Let the following sets be defined: 
C* = A* — A*¥f) (uF U Fu), 
Z = {f: uf e A* — A*N Fu, 
Z’ = {f: fue A* — A*N uF}, 
J* = A*fM uFN/ Fu, 
P* = {f: fue A*N uF}. 


Thus, A* admits a partition into the sets C*, uZ, Z’u and J*; by construction, 
there exists a recurrent support P such that P* = P* — P (with P = {e} if P* 
is empty) and one can verify that there exists a birecurrent support B such that 
B* admits a partition into the sets C*, |u} and Z’PuZ. 

In order to verify that 6; = 6* we take an arbitrary positive product measure 
u and, for any F’ C F, we write T(F’) as an abbreviation for >> (\f| uf: f ¢ F’). 
Thus, by III.3, we have, e.g., 6* = T(A, uw) = T(A*). 

By a simple computation, we obtain when 6* is finite: 6* = T(A*) = T(C*) + 
T(P*) + |ul(uZ + wZ’ + wP*)pu + (T(Z) + T(Z’) + T(P*)) wu. Also, 
pZ = pZ! = 1 — pP*; P= (1 — wpP*)"; T(P) = (1 — wP*)°T(P*). Now, 
T(B*) (= 6&5) is equal to the sum T(C*) + |uluu + T(Z'PuZ); because of 
the above relations, we have T(Z’PuZ) = |\uluupZ + (T(Z) + T(Z’) + 
T(P*))yu and this concludes the second part of the proof. 

Let us now observe that B* is finite if and only if C* is finite and P = {el}. 
The first condition is surely satisfied when A®* is finite and the second one is 
equivalent to P* = ¢, that is, to A*N uF N Fu = ¢. 

Thus, if A* is the set of all words of length n > 2 and if x, , x. & X, the word 
u = x} x belongs to F — A*F and to F — FA* and it satisfies our last condi- 
tion; this ends the proof of IV.4. 

If we take n = 2 and u = 2, we find that P* = 2, and the corresponding B* 
is infinite; this is the example needed for IV.3. 


IV.5. For each finite n there exists only a finite number of birecurrent sup- 
ports A with 6* = n which satisfy one or the other of the two following supple- 
mentary conditions: that yF is a group or that A* is finite. 

Proor. This is obvious for the first condition since, because of II.4’, it amounts 


to the fact that for any finite n there exist only finitely many groups of permu- 
tation on n symbols. 


With respect to the second condition we first verify the following elementary 
remark: let Ko = F — fel, K,, Ke, --- be a decreasing sequence of subsets of 
F defined inductively by the relation K;,, = {fFf:fe Ki}. If X is finite there 
exists for every finite 7 a finite value d(7) which is such that every word of 
length at least d(z) has at least one factor belonging to K; . Indeed, if d(7) has 
already been defined, we take d(i + 1) as d(z) (1 + |X 4) where |X| denotes 
the number of elements of XY. Then, every word of length d(i + 1) contains at 
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least two disjoint identical factors of length d(z) and the result follows by in- 
duction. 

We now observe that if f # e the hypothesis that A* is a finite set (with finite 
5*) implies that 6f’f = inf (6*, 6f + 1). Indeed, this is surely true if 4f’ > df 
or if off’f = 6*; in the remaining case, i.e., in the case that df = dff’f < 6*, we 
would have according to II.2, for all finite m, 6( ff’)"f = éf < 6* and, according 
to the same remark, there would exist for all finite m at least one a e A* admit- 
ting (ff’)"f as a factor, which is impossible since A* is assumed to be finite. 

Thus, by induction, every word f of length 2 d(é*) is such that 6f = 4* and, 
consequently, it cannot be a factor of a word ae A*. This proves that for given 
5* the hypothesis that A* is finite imposes that the lengths of the words from 
A* is bounded and it concludes the proof (cf.[6]). 
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MAXIMUM LIKELIHOOD CHARACTERIZATION OF DISTRIBUTIONS 
By Henry TErIcHER 
New York University and Purdue University 


1. Introduction. It is a commonplace observation that the sample mean and 
sample variance from a normal population (based on a random sample) are 
stochastically independent. Considerably less prosaic is the converse proposition, 
first proved in 1936 by R. C. Geary [2] (under superfluous restrictions), to the 
effect that the independence of these two statistics entails normality of the under- 
lying population. This, plus the theorem that if two linear combinations (non- 
zero coefficients) of a pair of independent random variables are themselves inde- 
pendent, the variables are normally distributed, which was proved by Kac in 
1939 [4], are harbingers of what are today referred to as characterization theorems. 
An extensive bibliography of such theorems appears in [5]. Most of these results 
have the format: if such-and-such statistics are independent (alternatively, if 
the distribution of such-and-such a statistic is thus-and-so), the underlying 
population is so-and-so. 

The ensuing theorems belong to this genre but adopt a maximum likelihood 
posture. The first deals with translation (location) parameter and the latter 
with scale parameter families of distributions. 


2. Preliminaries. Since the results expounded here concern maximum likeli- 
hood estimators, it would seem appropriate to say a few words concerning these. 
It is somewhat surprising that major treatises on mathematical statistics and 
estimation do not define maximum likelihood estimators per se but merely a 
maximum likelihood estimate. (Pitman’s terminological demarcation between 
these notions will be made explicit shortly.) The definitions of [8], [9] are closest 
in spirit to that given here. 

In order to pave the way for a discussion of these questions, let F(x; 6), —» < 
2 < ©,0¢2C R' denote a one parameter family of probability distributions on 
the real line R’ with spectra Sp. Define S = Us.oS,;and S*°=S*SxX-:--xXS, 
the n-fold cartesian product of S with itself.’ If, for each @ ¢ 2, F(a; @) is ab- 
solutely continuous, designate its probability density function (p.d-f.) 
by f(z; 6); if, for each 6, F(x; @) is a step function, the same notation f(z; @) 
will be used to specify the so-called discrete p.d.f., that is, the mass function of 
the corresponding distribution (positive at the countable set of points consti- 
tuting S, and zero elsewhere). 

The customary definition of a maximum likelihood estimate of a parameter @ 
of a population (family of distributions generally restricted to the aforementioned 
types), based on a (random) sample of n observations x; , 2, --- , Zn, is a value 
of 6, say 6, , which renders []71 f(a, ; 6) a maximum. A maximum likelihood 
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1 When Us Ss = [a, ~) or (—, b], the points a, b will be deleted in defining S (so as to 
avoid a special treatment of the origin in Theorem 2). 


1214 





MAXIMUM LIKELIHOOD CHARACTERIZATION 15 


estimator (M.L.E.) is presumably a function 6, = 6,(a, 22, --:,2n) from S” 
into 2 which, for every choice of 2, , --- , £, , is a maximum likelihood estimate. 
(It is by no means apparent that a M.L.E., so defined, is a bona fide random 
variable and it would seem of interest to give minimal conditions under which it is 
measurable, [7]. Measurability is, however, tangential to the problem treated 
here. ) 

Unfortunately, from the theoretical standpoint, such a definition harbors am- 
biguities of a petty but disconcerting nature. The fact that these annoyances 
crop up on sets of measure zero does not seem sufficient reason to ignore their 
existence. 

First, (consider the absolutely continuous case) as a consequence of the fact 
that for each 6 ¢ Q, f(x; @) is defined only to within a set of measure zero, it is 
possible to change a M.L.E. by altering f(z; @) at one value of z for each @ or 
even prevent its existence by a perverse choice of f(x; @). If scale and translation 
parameter families are involved, f(x; @) is a function of one variable only and the 
scope for tampering is greatly diminished. 

Suppose that a suitable version of f(x; @) has been singled out. Then, a M.L.E. 
6, will be interpreted as a function from S” into Q satisfying 


n 


n 
(0) I] f(a: 5 bn) = I] f(a: 5 9) 

in i= 
for all 6 ¢Q and all (x, --- , tn) ¢ S". (Note that if R’ — S is non-empty and 
6, were assigned any value in Q for (z,, --- , 2.) eR” — S", (0) would hold in 
the degenerate form 0 = 0.) 

Secondly, if all (x, , --- , z,) in S” are pertinent to the definition of a M.L.E., 
how is one to interpret 0- © if it occurs in (0)? (A p.d.f. may be infinite on a set 
of measure zero.) The conventional interpretation of this product as zero seems 
mandatory when the value x, for which f(z. ; @) = 0 belongs to R' — S (if this 
set is non-empty) and will be adopted for z, ¢ S as well. 


3. Characterization theorems. Theorem 1 which deals with translation param- 
eter families emerges as a generalization and modernization of a result of Gauss 
[1] when the latter is suitably interpreted and rescued from its context of least 
squares.” 

Turorem 1: Let {F(x — 6), @¢ R’} be a translation parameter family of abso- 
lutely continuous distributions on the real line and let the version of the p.d.f. f(x) 
be lower semi-continuous at x = 0. If, for all (random) samples of sizes two and 
three, a maximum likelihood estimate of 0 is the sample arithmetic mean, then F(x) 
is a normal distribution with mean zero. 


2 There is a vast literature consisting of discussions, proofs and reproofs of Gauss’ result. 
In view of the fact that the latter was formulated in a least squares context and further 
that many notions and distinctions which are today commonplace were then only dimly 
(if at all) perceived, many of the disquisitions are heuristic and unrigorous by modern 
standards. Among the multitude of commentaries, three ({10], [11], [12] p. 169) are singled 
out for reference. 

All prior proofs which have come to the writer’s attention assume implicitly or explicitly 
that the density function f(z) is differentiable. 
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(n + 1)” >o2' «;, it follows from (0) and the hypothesis 


Proor: If = 
1, 2 and all real 2, , 22, «++ , 2n4: and 8, 


that for n = 


n+1 n+l 


I f(a: - @ = I] f(a: - 8); 


i=l i=] 


hence, 


n+1 n+1 


(0)’ If@o = If@- 4) 
i=] i=] 


for n = 1, 2 and all real 6 and y;, --~- , Yn4; Satisfying Doe y; = 0. 
Set n = 1,y% = y = —ye in (0)’ to obtain 


(1) fiw f(-y) 2 fly — f(-y — 4), all real y, 0. 


Note that f(0) = 0 implies f(y) vanishes identically. Suppose that f(a) = © 
for some real number a. Then, according to (1), for each ye R’ either 
f(y)f(—y) = & orf(2y + a) = 0. Fora p.d-f., the former cannot hold on a set 
of positive (Lebesgue) measure, while the latter cannot hold almost everywhere. 
Thus, every p.d.f. satisfying (0)’ is positive at the origin and everywhere finite 
(so the product 0- © will not arise in (0)’). 

Let h(x) = log. f(z) where h(x) may possibly assume the extended real 
value — «. Then it follows from (0)’ that for n = 1, 2 and all real y: , ye, --: 
Yn, 9, 


(2) Dir(y:) + n(— > us) > dD Ay: — 0) + n(- Yi - i) 


i=] i=l] i=] 


, 


As will be seen, under the meager assumptions contained in the probabilistic 
framework, the functional inequality (2) determines h(x) and therefore f(z). 
In particular, (2) implies 


nh(y) + h(—ny) 2 nh(y — 6) + h(—ny — 8), 
which, for n = 1, becomes 
(3) h(y) + h(-—y) = h(y — 0) + h(—y — 8). 


Note that if in (2), @is replaced by —6@ and y; by —y; , the resulting inequality 
when added to (2) reveals that, if g(y) is a solution of (2), soish(y) = g(y) + 
g(—y) and we therefore confine attention at first to symmetric solutions of (2), 
which then takes the form 


(2)’ Diy.) + (3 v)=DmMu- 9) +L x +0). 
i=] 


i=] i_ t=1 
Similarly, (3) becomes 
(3)’ 2h(y) 2 h(y — 6) + hA(y + 8), all y, 0. 


Suppose that h(y) = —~ for some y > 0 and let c be the infimum of the 
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set A of all such positive values y. Taking n = 2, y, = yo = 3em,0 = —4}em in 
(2)’ yields 2h(4cem) + A(em) 2 3h(2cm). Choose cm \v C, Cm € A, m = 1, 2, 

- , implying h($c,) = —*©,m 2 1. If ¢ > 0, a contradiction ensues, while 
if c = 0, f(x) is not lower semi-continuous at zero. 

Thus, h(y) is everywhere finite and according to (3)’ concave. Since any 
p.d.f. f(z) is necessarily measurable, h(x) is likewise, whence [3, 6], A is a con- 
tinuous concave function. 

Let D denote the complement of the (at most countable) set of points at 
which h(x) is not differentiable and denote by q(x) the derivative of h(x). Then 
q(x) is monotone and defined for all x ¢ D. 

The fact, as expressed by (2)’, that @ = 0 maximizes >°%, h(y; — 0) + 
hi 7 oe yi + 6) now requires that 


(4)’ - Law) +a(E x) =0 
= = 
for all y; ¢ D such that pe > y:¢D,n = 1,2. Forn = 2, (4)’ becomes 
(5)! qu) + ay) = aw t+ ye) form, y,y% + me D. 

Let C = {f} be the class of non-negative measurable functions on R' which 
are everywhere finite, lower semi-continuous at zero, and do not vanish almost 
everywhere. Let C’ be the subclass of functions in C which do not vanish any- 
where. Since the only monotone solution of Cauchy’s functional equation (5)’ is 
q(y) = ay, it follows that the only symmetric functions of C satisfying (0)’ 
(which are necessarily in C’) are given by h(y) = log. f(y) = —cy’ + d for 
y € D and therefore by continuity for all real y. 

Suppose next that f(y) is any element of C satisfying (0)’. According to (1), 
f(y) -f(—y) does not vanish almost everywhere (take @ = —y); it is readily seen 
that f(y) -f(—y) isa symmetric function in C and, as previously noted, a solution 
of (0)’. Thus f(y) -f(—y) ¢ C’ implying f(y) ¢ C’. 

In fact, necessarily for some real constants c and d, 


g(y) = log. f(y) = —3(cy’ — d) + b(y) 


where b(y) is an odd function. For, by the preceding, g(y) + g(—y) = —cy’ +d 
for some c, d and this implies that b(y) = g(y) + 3(cy’ — d) satisfies b(y) = 
—b(-y). 

Substituting for g(y) in (3) yields 


(4) ch = b(y — 0) — b(y + 8), all y, 6. 


Replacing y by —y and @ by —@ in (4) and combining the result with (4), pro- 
duces 


(5) b(y — 0) — b(y + @)| S c@’, all y, 9, 


which, in turn, necessitates c = 0 and implies that b(y) is differentiable and 
constant, hence identically zero. 





1218 HENRY TEICHER 


Consequently, the only solutions of (0)’ in C are given implicitly by h(x) = 
—tex* + d,c = O and thus the only p.d-f.’s satisfying the conditions of the 
theorem are f(x) = (ce, Qr)te that is, normal density functions with mean 
zero. 

REMARK 1: The integers two and three of the theorem may clearly be replaced 
by other pairs, e.g., 2k, 3k, k > 1. It seems most desirable, however, to state the 
result with minimal n. 

Remark 2: If f zf(x) dz exists and is zero, the translation parameter @ is 
the mean of the distribution F(z — @). In such cases (excluding the normal), 
the theorem implies that the sample mean is not (for samples of sizes both two 
and three, a fortiori, for all n) a maximum likelihood estimator of the population 
mean. This is readily seen, for example, if 


f(z; 0) = C,-exp {—|z — 0@\*}, a # 2. 


REMARK 3: It seems of interest to note in the case where F(x — @) is a rec- 
tangular distribution with mean @, i.e., f(x) = 1 for |jz — 6| S 4 and zero other- 
wise, that, whereas Z is a M.L.E. of @ for n = 2, its numerical value is not a 
maximum likelihood estimate of @ for all random samples of size three. 


4. Scale parameter families. Considera scale parameter family* of absolutely 
continuous distributions § = {F(2/c), ¢ > 0}. The joint density function of n 
independent random variables, each distributed as F(x/c), is o "| [pa f(2:/c) 
where F(x) = f2..f(u) du. To say that ¢ = é(2,, --- , 2.) is a maximum likeli- 
hood estimator of ¢ is to say for all ¢ > 0 and z,, --- , z, in S” that 


é *[] f(x./s) = o “I f(2xi/o). 


i=] t=] 


Let y; = x;/¢,\ = &/o. Then, if ¢ is a homogeneous function of degree one in 
21, °** , Xn, the preceding implies 


(6) II fy) = MIT fay) 
t=] 


t=1 


for allA > O and wy, --- , yn satisfying 
(7) 6(41,Y2,°'*, Yn) = 1. 


If h(y) = log. f(y) is finite valued, (6) may be transcribed as 


1 > [h(y:) — hOys)] B log. 
i=] 


(8) 
Inspection shows that h(y) = —log.y + const. satisfies (8), with equality 
holding for all choices of y; , --- , y. and a fortiori for y,’s satisfying (7). How- 


3 The standard device of reducing a scale parameter family to a translation parameter 
family (so as to utilize Theorem 1) appears fruitless here. 
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ever, f(y) = cy is not integrable on (0, ~ ) and truncation to a finite interval 
will be precluded by the conditions of Theorems 2 and 3. 

In the following theorems the indispensable absolute continuity assumption 
is augmented by a possibly dispensable continuity assumption of the density 
function. The seemingly ad hoc condition (ii) on the other hand, appears to be 
crucial. 

THEOREM 2: Let { F(x/o), ¢ > O} constitute a scale parameter family of absolutely 
continuous distributions with the version of the p.df. f(x) satisfying 


(i) f(x) is contiuous on (0, © ) 
flay) 
(ii)* lim? = 1, 

vio f(y) 


If, for all sample sizes, a maximum likelihood estimator of o is the sample arith- 
metic mean, then F is the exponential distribution, i.e., 


f(z) =e", z>0O, f(z) =0, z~s0. 


all > 0. 


Proor: Since 2 = {o:¢ > 0} and Zis a posited M.L.E., necessarily S C S’ = 
{a:2 > O}, whence f(x) = 0 in R' — S’. It suffices, therefore, to consider f(x), 
xe S’, noting that f(z) # 0 in S’. Infinite values of f are precluded by con- 
tinuity and it will now be shown that f(z) > Oin S’,ie., S = S’. 

From prior remarks, (6) obtains with (7) becoming 


(7.1) uv = n. 


t=] 


In (6), choose y; = k/m,1 SiS mandy; = [((n—k)/(n—m)|,m+1s8 
i S n, where k, m, n are positive integers satisfying k < m < n; this yields 


© Orta @r-tes). 


Let k/m — a, m/n — c. Then for all positive \ and all c, a in (0, 1) 
¢ 1—e l— oe c 1—c ACI aa ac ) 
aaa > aeaidenai 
(10) sag’ (1=) = roa (“C2 
Now, if there exists a sequence a, — 0 with f(a,) > 0,n = 1,2, --- , it follows 
from (10) and (ii) that 
(11) f(y) = fy) for y= 1, A> 0. 


Thus, if f vanished for some y = 1, it would vanish identically. Further, from 
(6), if f(y) were zero for some y in (0, 1), f(y) would have zeros in (1, ~). 
Alternatively, suppose that for some 6 > 0, f(y) = 0 in (0, 6). Since f(y) 4 0 


4 This condition is automatically fulfilled if 0 < limz\o f(z) < ~. Also, it is reiterated 
that only random samples are under consideration. 
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in S’, (6) insures (y; = 1) that f(1) > 0, whence 6 < 1 and there is no loss of 
generality in supposing that (0, 5) is the maximal interval in which f vanishes 
identically. Then, from (10) follows 


: e {ACL — 6c) 
(12) 0 = rf*(ns) f ee —*)) 
for all \ > 0 and c in (0, 1). By continuity, f(y) > 0 in (1 — e, 1 + e) for all 
sufficiently small e > 0. Hence, taking \ = (1 — €)/éin (12), 


fill — e)(1 — ée)J/6(1 — c)} = 0 


for 0 < ¢ < 1 implying f[(1 — «)/é4] = 0, all sufficiently small « > 0. 
Now, let k + 1 be an integer greater than (1 — ¢)/6. From (6), 


0 = f[A(1 — e)/ 8] -fe{Alk + 1 — (1 — ©)8//k}. 


Hence, f{[5/k(1 — e)|[k + 1 — (1 — €)8']} = 0 for all sufficiently large k and 
all sufficiently small e, implying f[6/(1 — «)| = 0 for all sufficiently small « > 0, 
which contradicts the maximality of 6. 

Thus, any p.d.f. satisfying (6) is non-zero in S’. Consequently, (11) holds 
unconditionally and may be rewritten as 


(13) y [h(y) — h(dy)] = log. d, y21,A>0 


where, as before, h(y) = log. f(y). 
Replace \ by \’ in (13) and combine the result with (13) to obtain 


(14) 0 = h(dAy) — 2h(y) + Al(y/A), os LAS Ss, 


differentiable in (1, ») except perhaps for a countable subset D thereof. From 
(13), forA < Landy 2 1, 


This asserts that H(y) = h(e”) is concave for y 2 O and hence that h(y) is 


hay) — h(y) , logerA . h(y/r) — hy) 
yA—1) ~1—-A™~™ rglf(l/) - VJ 


whence (A 7 1), h’(y) = —1 on (1, ©) — D. Then by continuity, 
(15) h(y) = -y +e, y i 


Next, choosing y; < 1,7 = 1,---,7r < nand y; > 1,7 > 7, in (6) and em- 
ploying (15) and (7.1), we find for all \ > 0, r < n and y; in (0, 1) that 


(16) > [h(yi) — hy) + (A — dA)ys] = n [loge A + 1 — AI. 
i=] 


For 0 < y < 1, (16) asserts (r = 1) that 
(17) 1/n{h(y) — h(ry)] = loge A + (1 — A)(1 — (y/n)). 


But for 0 < x < 1, log.’ + z(1 — A) > O for all J sufficiently close to and 
larger than unity. Thus, from (17), h is monotone decreasing in (0, 1). 
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Set h = hi + hey where hz is absolutely continuous and Ah; is singular, i.e., 
hi(y) = Oa. in (0, 1) and h,(y) = 0 for y > 1. 
Again taking r = 1 in (16), there follows for0 < y < landdA < 1 


h(ry) — hy +#Aa-Dyy Fé 
1 innate 
(18) oe =n Ty + !]- 


This implies ylho(y) + 1] = 0 almost everywhere in (0, a Similarly, replacing 
\ by A” in (16), the inequality is reversed. Hence, h3(y) = —1, almost every- 
where in (0, 1) implying k(y) = —-y+ca. 

Utilizing this result in (18), there follows 


(19) lim sup in(ay) — In (y) > lim sup mQry) — by) 5 9 


hol y{(A — 1) Nl i y( —1)— se 
for all y in (0, 1); hence hi(y) = 0. Also, by continuity c,; = c. 
Thus, for y ¢ S’, f(y) = ae ” and since f vanishes outside S’, a = 1. 
THEorEM 3: Let {F(2x/o), ¢ > O} be a scale parameter family of absolutely con- 
tinuous distributions with the version of the p.d.f. f(x) satisfying 


(i) f(x) continuous on (—, ~) 


(ii)4 limy+o [f(Ay)/f(y)] = all A>O 


1 i 
If, for all sample sizes, a maximum likelihood estimator of ¢ is (n™ >>%, 27)', 
then F(x) is the normal distribution with mean zero and variance one. 
Proor: Here (7) specializes to 


Ly=n 


i=] 


+(k/m)',1 Ss is m; yi = 24l(n — k)/(n — m)|, 
n. Analogous to (10), there follows 


yg (4 y/ ; ha)f*~* (+X (- #0) 
ar (2¥ ) = aroun (94/29 
valid for \ > 0, ja| S 1,0 < ¢ < 1. An argument akin to that employed in 
Theorem 2 shows that f is non-vanishing and it follows from (10)’ that 


y [h(y) — h(Ay)] = log. A, ly} = 1,\>0. 


Again, h is differentiable, this time in the region A: |y| 2 1 except perhaps on a 
countable subset D’ thereof. Proceeding as in the proof of Theorem 2, we find 
that h’(y) = —y for yin A — D’ and hence that h(y) = —43y’ + c for |y| = 1. 
The analogue of (16) is 


_—r 


(16)’ >, {h(ys) — h(ry:) + (1 — »)4yi] = nflogedA + 4(1 — A*)] 


i=] 


where \y;| < l andr < n. This implies that h is decreasing in (0, 1) and increas- 
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- 


ing a (—1, 0). An argument paralleling that of Theorem 2 yields h(y) = 
—4y + ¢, for |y| < 1 implying f(y) = a exp {—y’/2}. Finally, a = (27)* 
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ON THE DISTRIBUTION OF FIRST SIGNIFICANT DIGITS! 


By Roger S. PInKHAM 


Rutgers—The State University 


Introduction. It has been noticed by astute observers that well used tables of 
logarithms are invariably dirtier at the front than at the back. Upon reflection 
one is led to inquire whether there are more physical constants with low order 
first significant digits than high. Actual counts by Benford [2] show that not only 
is this the case but that it seems to be an empirical truth that whenever one has 
a large body of physical data, Farmer’s Almanac, Census Reports, Chemical 
Rubber Handbook, etc., the proportion of these data with first significant digit 
n or less is approximately logio(m + 1). Any reader formerly unaware of this 
“peculiarity” will find an actual sampling experiment wondrously tantalizing. 
Thus, for example, approximately 0.7 of the physical constants in the Chemical 
Rubber Handbook begin with 4 or less (logi(4 + 1) = 0.699). This is to be 
contrasted with the widespread intuitive evaluation $ths. 

At least two books call attention to this peculiarity, Furlan [6] and Wallis 
[18], but to my knowledge there are only five published papers on the subject, 
Benford [2], Furry et al [7], [9], Gini [8], and Herzel [11]. The first consists of 
excellent empirical verifications and a discussion of the implied distribution of 
2nd, 3rd, --- significant digits. The second and third put forth the thesis that 
the distribution of significant digits should not depend markedly on the under- 
lying distribution, and the authors present numerical evaluations for a range 
of underlying distributions in support of their contention. The fourth maintains 
that explanation is to be sought in empiric considerations. The fifth considers 
three different urn models; each yields a distribution of initial digits which the 
author compares with logio(n + 1). 

This paper is a theoretical discussion of why and to what extent this so called 
“abnormal law’’ must hold. The flavor of the results is, I think, conveyed in the 
following remarks. 

(i) The only distribution for first significant digits which is invariant under 
scale change of the underlying distribution is logio(n + 1). Contrary to suspicion 
this is a non-trivial mathematical result, for the variable n is discrete. 

(ii) Suppose one has a horizontal circular dise of unit circumference which is 
pivoted at the center. Let the disc be given a random angular displacement 
6 where —« < 6 < o. If the final position of the dise mod one is called ¢, i.e., 


¢ = @mod(1), 0s¢<1, 
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then ¢ is a random variable whose probability structure is determined entirely 
by that of @. In fact if 


Pr(a#S60<2+ dz) = g(x) dz, 
and 


Pr(ys¢e<yrtdy) =fly) dy, 
then 


x 


fly) = LX gly+m). 
m=—20 
Now it is intuitively obvious that for a wide range of possible distributions 
of 6 the distribution of g should be approximately uniform i.e., 


fly) © 1, Osy<l. 


This and related properties of distributions wrapped around a circle have been 
known for some time, Dvoretsky [4], Lévy [14], Robbins [15], and put to various 
uses, Aitchison [1], Brown [3], Horton and Smith [12], Tochcr [17]. 

The logarithmic law of left-most significant digits is a consequence of the 
above property of random variables mod one. One can see this as follows. Let 
F(x) be the cumulative distribution function for the population of physical con- 
stants (taken non-negative for convenience ). Define D(s) by 


oo 


D(z) = >> [F(x10") — F(10")], z>0. 
D(n) for n = 2,3, --- , 10 gives the proportion of the population with first sig- 
nificant digit n — 1 or less. The logarithmic “‘law”’ states that D(n) should be 
approximately logio(n). Thus one suspects that 


logio (x) & >> [F(x10") — F(10”)}. 


m=— oO 


A change of variables will make clear the connexion with the spinning disc. Let 
y = logyw(x) and G(y) = F(10"). 
One then has 
2 
y & 7. [G(y + m) — G(m)], 
or, taking derivatives, 


I~ Dd g(yt+m). 


mM=— oO 
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This latter approximate equality is the one mentioned before in connexion with 
random variables mod one. 

Section 1 gives the mathematical support for contention (i) while Section 2 
provides a mathematical basis for the approximation alluded to in (ii). 

After the mathematical work of Section 2 had been completed I discovered 
the basic mathematical idea without the detail in a discussion by I. J. Good of a 
paper by Tocher [17]. 


1. An invariance principle. The population of known physical constants 
changes daily, but the collection of such constants can be regarded as a large 
sample from an unknown underlying distribution of all physical constants. It is 
this underlying distribution in which interest will center. 

Such mental constructs are familiar in the natural sciences. Thus most physical 
objects are regarded as having a density even though they are “known” to have 
a granular structure at the atomic level. Such entities are of course outside the 
compass of mathematics per se. 

Consider the population of all physical constants and the derived distribution 
of first significant digits. Suppose all the physical constants were multiplied by 
some fixed number. What would happen to the distribution of first significant 
digits? One feels, I think, that it would be the same as before. This invariance 
property is enough, as is shown below, to characterize the distribution com- 
pletely. Logy(n + 1) emerges as the necessary cumulative. The basic mathe- 
matical fact is that a certain derived functional equation has one and only one 
solution. 

Suppose F(x) is the cumulative distribution function for the population of all 
physical constants (assumed non-negative) in accordance with their size. Then 


x 


(1) D(x) = >> [F(210") — F(10")), t>0, 


m=—a 


is a well defined function for positive +; D(n) for n = 2, --- , 9, 10 gives the 
proportion with first significant digit n — 1 or less, since all numbers between 
10” and nX10” begin with n — 1 or less. 

If all the physical constants are multiplied by a positive constant c, then the 
resulting cumulative is F(2/c). The postulated invariance yields 


D(n) = 2 lF (: 10") —FP (**)) 


D(n) = D(n/c) — D(1/c), ec>0;n =2,--- , 10. 


If the relation (2) held for arbitrary positive real n rather than n = 2, --- , 10 
one could, assuming continuity, immediately deduce D(n) = login. We now 
show this conclusion to be justified under even weaker conditions than are im- 


plicit in Equation (2). 
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THEoreM 1. Jf 
B D(2) + D(a) D(2z), 
2. D(10) + D(x) = D(10z), 
3. D(x) is continuous; 
4. D(10) = 1; 
then D(x) = logiw(x), x > 0. 
Proor. Let H(x) = D(10°). Then conditions 1 and 2 become 
(3) H(logn) + H(y) = H(logn+y), -—-~% <y< o,n = 2,10. 


Thus H(N log n) = NH (log n) if N integral, and one has H(N) = N since 
H(1) = 1. From the theory of continued fractions one knows, Hardy and 
Wright [10], 


log 2 = (pm/qm) + 0(1/qm) (m— o) 
with p» , Gm integers. L.e., 


qm log 2 = pm + o(1) (m— «), 
Hence by hypothesis 2 
Qm H(log 2) = pm + o(1) (m— @),. 


Therefore H(log 2) = log 2. Suppose a irrational, and let [x] denote the largest 
integer not exceeding x. Then it is ell known, Kac({13], p. 41), that the se- 
quence 


dn = na — {na}, 


is uniformly distributed on [0, 1]. Thus there exists a subsequence a,’ converging 
to any fixed h(0 S h < 1). Take a to be log 2. 


H(a,) = n'H(log 2) — [n’log 2] = ay’. 


Letting n’ tend to infinity yields, by the assumed continuity, H(h) = h. Since 
y=lyl+y—yl,d(y) =y,(-2~ <y < ©), and D(x) = logue. 

It is reasonable to consider F(x) continuous from which it follows that the 
D(x) of (1) is continuous and thence by the Theorem 1 that D(x) = logio(x). 


2. An approximation. Drop from consideration any invariance postulate. Con- 
sider to what degree logiox provides an approximation to 


> [F(210") — F(10")), x < 10. 
(F(-) has the same significance as before.) Let G(y) = 7(10”). Then one may 
as well consider how z approximates 


« 


(4) J(z) = >> (G(x + m) — G(m)), 


m=—co 
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It is reasonable to take some canonical representation of G(x) and hope that 
the sum J(x) can be evaluated explicitly. A statistician immediately thinks of 
Fourier transforms since characteristic functions always exist for distributions. 

A trouble immediately appears. In trying to evaluate the sum 


a 


J(z) = > IG(2 + m) — G(m)], 


m=—2 


one immediately stubs against sums of the form 


x 


= : 


m=0 


which of course do not converge. To overcome this difficulty one may introduce 
a “convergence factor’ and subsequently sneak it out again at the end. 
Thus, define J(z|t) by 


«2 


J(a|t) = > [(G(2 +m) — G(m)}t'™, 0<t<1, 


m=—oO 


and W(u) by W(u) = f%. e™’ dG(t). Then W(u) exists for all —2 <u < o, 
and 


—tzru 


G(e) = + [ += 2 — ww) du 
T Jax wu 


Suppose 
y \ =f ' 
W(u) = O(u"), a> s, (|u| — o) 
Then, by merely summing geometric series after switching the order of summa- 
tion and integration, one has 


1 oo — oo . 
J(z|t) = — —— W(u)P(u, t) du; 

2m J-« iu 
P(u, t) is the Poisson kernel given by 

1-f¢ 

(6) P(u, t) = ———_>——__ . 

(u, #) 1+ @ — 2tcosu 
The interchange of limits is justified by the assumed order condition. 

The Poisson kernel when properly normalized is a frequency function un- 

common in statistical circles. Thus 


x P(u,t)du=1 and P(—u,t) = P(u, t). 
at dr 


Furthermore, the variance of the distribution goes to zero as t tends to one. 
Hence 


lim e P(u, IQ(u) du = Q(0), 


tol 27 dr 


if Q is continuous at the origin. 
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Now to return to J(x|¢). Splitting the integral up into integrals over con- 
tiguous intervals of re h 2x, and utilizing the periodicity of P(u, t) yields 


-iz(u+2rk) 


J(x|t) = Wu + 2xrk)P(u, t) du. 


Lf. 1 

2, Qn te iu + i2ak 
A second appeal to the assumed order conditions on W(u) allows one to take 
limits as t — 1 term by term. Thus, since W is continuous, 


—i2ekz 


J(z|1—-) =a ss —__ W(2xrk) 


z#0 


But by Abel’s theorem, Titchmarch [16], J/(z) = J(x|1 —). Hence finally 


! 


1 ma —i2ekr 
—_— W(2xrk). 


k 0 I2rk 


(7) J(z) —2z 
In the case G(x) is symmetric about zero, viz. G(x) = 1 — G(— x), one has 


~ sin (2rkzx) 
(8) J(z) —z = >> W(2ek) ——_—_—--. 
k=l ak 

It is now clear that the quality of the approximation is in general high and 
does not depend on the fine structure of G and hence of F. For only W in the 
neighborhood of the origin is liable to inflate the sum appreciably and this de- 
pends primarily on the nature of G at infinity. 

If, for example, G(y) is Gaussian with mean zero and variance o', then 
W(u) = exp (—}o'u’), and it is very clear that as o increases and the tails lift 
the approximation improves markedly. This is in excellent accord with one’s 
intuition. 

An explicit bound on J(x) — x may be obtained by noticing that {1 — e °""*| 
< 2, and hence 


+» 1 ww (2xrk)|. 


keo wk! 


If G has a density g, then 


W (2rk) a 1 | on dg. 


2xki 


Thus 


The % l ™ oa r 
|W (2ek)| S 5 ‘ dg\ = 5, Vial, 


_ 


where V{g] is the variation of g on (— ~, ~). Whence 


\J(z) —a\ s < Vig! 3 -5 Vig). 


2r k30 ke 


We summarize in the following 
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THEOREM 2. If 


W (u) | e* dG(z); 


x 


W(u) = O(jul), h>0, |u| > ©; 


oo es —i2ekr 
1 


> (Gla + m) — G(m)] =  ——* 


m=—0oo k=—oo i2nk 


W (2rk). 


Coro.iary. If 


W(u) - [ e’*g(x) dz; 


Vig) < ~; 


J(z) = >> (G(a + m) — G(m)); 


m=—2 


then 
\J(x) — 2| S ¥V Mg). 


3. Remarks. In inventory problems one is often concerned with non-negative 
random variables X;, 7 = 1, 2,--- which are independent, identically dis- 
tributed, and possess a mean much smaller than some number K. One is inter- 
ested in the first time S, = X; + --- + X,,n = 1, 2,--+ , exceeds K. Let 
this time be a random variable 7’. If the time axis is split up into contiguous 
intervals (periods) of length P, much smaller than the mean and variance of T, 
then it is often assumed that the time during the period at which the first ex- 
ceedance occurs has an approximately uniform distribution. Time is here being 
measured from the beginning of the period. This is intuitively very appealing. 
Suppose 7’ has cumulative G(t) and that the problem is scaled such that P = 
Then intuition says J(x) — z is “small”, where 


© 
J(x) = >> (G(r +n) — G(n)}. 
n=() 
Previous results make it clear why this is in fact so. 

A close connexion exists between J(x) — x being small and Poincaré’s ob- 
servation on finely divided roulette wheels. Suppose the disc mentioned in the 
introduction is divided up into 2n contiguous intervals alternately of length p 
and 8. Let p/(p + 8) and B/(p + 8) be independent of n. The segments of length 
p are called red, the others black. Fréchet [5] shows, for arbitrary distributions 
of 6, that the probability of obtaining red approaches p/(p + 8) as n tends to 
infinity and thus similarly for black. Here the quality of the approximation is 
improved by shrinking the fundamental unit relative to the variance of the 
underlying distribution rather than increasing the variance relative to the funda- 
mental unit. 





1230 ROGER 8S. PINKHAM 


These considerations have an obvious import for the generation of pseudo- 
random numbers both by electronic computers and by special purpose machines. 

The foregoing results bear on questions of round-off in computing machines. 
Since d(w) = udv + vdu the error resulting from multiplying two rounded 
numbers will be governed primarily by the first significant digits of the two num- 
bers being multiplied. Now the distribution of first significant digits, favoring as 
it does low order digits, tends to produce less error than would be the case if first 
significant digits were uniform as has sometimes been assumed. 


Acknowledgments. F. Mosteller and W. Kruskal provided the references to 
previously published material, and I am most grateful. R. Hamming was first 
to call my attention to the logarithmic law. He also was first to suggest that 
only the logarithm would satisfy the invariance principle. To one who invariably 
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MARKOV RENEWAL PROCESSES: DEFINITIONS AND 
PRELIMINARY PROPERTIES! 


By Ronautp Pyke? 
University of Washington 


1. Summary. This paper contains the definition of and some preliminary 
results on Markov Renewal processes and Semi-Markov processes. The close 
relationship between these two types of processes is described. The concept of 
regularity is introduced and characterized. A classification of the states of a 
Markov Renewal process is described and studied. 


2. Introduction. At the International Congress of Mathematicians held at 
Amsterdam in 1954, Lévy [1] and Smith [2] independently presented papers in 
which a new class of stochastic processes, called Semi-Markov processes 
(S.-M.P.) by both authors, was defined. These processes are generalizations of 
both continuous and discrete parameter Markov processes with countable state 
spaces. In the case of Lévy, the suggestion of this possible generalization is 
credited to K. L. Chung. Also in 1954, Takacs [3] introduced essentially the 
same type of stochastic process, and applied them to some problems in Counter 
theory. 

A rough, yet descriptive, definition of an 8.-M.P. would be that it is a stochastic 
process which moves from one to another of a countable number of states with 
the successive states visited forming a Markov chain, and that the process stays 
in a given state a random length of time, the distribution function (d.f.) of which 
may depend on this state as well as on the one to be visited next. It is thus a 
Markow Chain for which the time scale has been randomly transformed. 

The family of stochastic processes to be defined and studied in this paper, 
called Markov Renewal processes (M.R.P.), may be shown to be equivalent to 
the family of 8.-M.P.’s. An M.R.P. is one which records at each time ¢ the 
number of times a particle has visited each of the possible states up to time ¢, 
if the particle moves from state to state according to a Markov Chain and if the 
time required for each successive move is a random variable (r.v.) whose d.f. 
may depend on the two states between which the move is being made. 

It will be seen, after the definition of an M.R.P. has been formalized in Section 
3 below, that a Renewal process (i.e., a sequence of independent, identically 
distributed nonnegative r.v.’s) is equivalent to the special case of an M.R.P. 
with one state. However, as will become evident in the discussions below and in 
[4], the relationship between Renewal theory and that of M.R.P.’s is very much 
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stronger than this fact alone indicates. Indeed, it would not be overexaggerating 
to describe the present theory as a marriage of the theories of Markov Chains 
and of Renewal processes. It is this close relationship which suggested the nomen- 
clature, Markov Renewal process. 

In Section 3, M.R.P.’s and S.-M.P.’s are defined, as well as some related 
processes. Although the processes studied here have been given ‘‘constructive”’ 
definitions, and hence are automatically separable, and have no instantaneous 
states (as may the S.-M.P.’s defined by Lévy [1]), there still exists the problem 
of whether or not an infinite number of transitions may be made in a finite 
interval of time. This problem is studied in Section 4, where a complete charac- 
terization is given of those M.R.P.’s for which only a finite number of transitions 
may be made in a finite interval of time. Such an M.R.P., with only a slight 
qualification, is said to be regular. It is proved that every M.R.P. with only 
finitely many states is regular. Several sufficient conditions for regularity are also 
given. In Section 5, an extension is made of the terminology used in classifying 
the states of a Markov Chain, to cover the case of M.R.P.’s. It is shown that 
the classification of any particular state in an M.R.P. is very greatly dependent 
upon the classification of this state in an embedded Markov Chain. 

Many papers have been written on 8.-M.P.’s and M.R.P.’s since 1954, mostly 
in the past two years. All papers known to this author which concern these 
processes and which are not referred to in the body of this paper are included 
in the supplementary references at the end of this paper, thus providing the 
reader with a complete list of references on this subject. 


3. Definitions and notations. Bold face letters such as F, Q, H, f, q, h, are 
consistently used in this paper to denote (real) matrix-valued functions, with the 
capital letters having domain (— «, ~) and the lower-case letters having do- 
main (0, ©). Mass functions (i.e., distribution functions whose total variations 
need not be equal to one) will be denoted by capital italic letters, whereas the 
corresponding lower-case letters will denote their respective Laplace-Stieltjes 
(L.-S.) transforms. For example, for s 2 0, f(s) = f*.e ” dF (x), which may 
for the present be infinite. It will be convenient to introduce the degenerate d.f.’s, 
U.(xz) = 1 or 0, according as x 2 or < c. Unless otherwise stated, the subscripts 
i, 7 in a matrix (b,;;) or elsewhere will run through the integers greater than or 
equal to 1 and not greater than m, where m, fixed, is either a finite positive 
integer or plus infinity. The following convolution notation is used in this and 
subsequent papers. K(t)*#L(t) = fo. K(t — y) dL(y) ift = Oand = Oift <0 
for functions K and L for which the Lebesgue-Stieltjes integral is defined. Write 
KL for the function K(-)*#L(-), K® = Uo(-), K‘” = K“" ’ #K, (n = 1,2, ---) 
and K~” = >°*_, K‘” whenever the series converges. 

DEFINITION 3.1. Let Q = (Q;;) be a matrix-valued function on (— ~, ~).Q 
is called a matrix of transition distributions if the Q;; are mass functions satisfy- 
ing (i) Q.;(t) = O fort S O and (ii) 2 oe Q(+0)=1U1siti<m+1). 

For each 7 and every real t, set H;(t) = > ae Q;;(t). With this notation, (ii) 
of Definition 3.1 is equivalent to stating that every H; is a df. 
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DEFINITION 3.2. The m X 1 vector A = (a, @,---,@;,°-:), iscalleda 
vector of initial probabilities if it satisfies (i) a; = 0 and (ii) >0%, a; = 1. 

DEFINITION 3.3. The (J, X)-process’ is defined as any two-dimensional sto- 
chastic process {(J,, X,); » 2 O} defined on a complete probability space 
(Q, ®, P), that satisfies Xo = Oass., 


(3.1) Pl{Jo = k] = ax 


and 
(3.2) PlJ,.=k,X.S2|Jo,51,X1,52,X2,°**,Snr,Xea) = Qzs,_,2(z) 


forallae(—«»,o)andlsk<m-+l. 

Set S, = > sat X; for n 2 0. The (J, X)-process defined above is closely re- 
lated to a Markov process as shown in 

Lemma 3.1. The two-dimensional (J, S)-process is a Markov process, and the J- 
process is a Markov Chain. In particular for 1 Sk <m+1,n>0 


(3.3) PlJn = k, Sr > y | J ’ Ji , Si ne J n—1 ’ Sr] = Qs,-1.8 (y 54g Sn) 


and 
(3.4) Frid a => cis. ai eon h > Sena = Qs,-1.4(+). 

Proor. That the J-process is a Markov Chain satisfying (3.4) is an immedi- 
ate consequence of (3.1), (ii) of Definition 3.1 and the Lebesgue monotone con- 
vergence theorem applied to (3.2) when z — +. That the (J, S)-process is a 
Markov process is implied by (3.3) and (3.1). To verify (3.5), write the left- 
hand side of this expression as 

Pld» - k, Xn Ss y= Sr | Jo, J; ’ Si > oe Jn ’ Sri]. 


Since the conditioning o-field of this conditional expectation is equal to that of 
the left-hand side of (3.2), and since this o-field is generated by a finite number 
of r.v.’s it is known that a conditional probability distribution (as defined by 
Doob [5], p. 26) exists, by means of which it is easily seen that (3.2) implies 
(3.3). 

Because of (3.4), it is natural to define p;; = Qi;(+ 2) and P = (p,;). By 
Definition 3.1, P is a stochastic matrix. Furthermore, if p;; > 0, define F;; 
pi; Qi; , while if p;; = 0 set Fi; = U,. (Actually, when p,; = 0, F;; may be 
chosen arbitrarily. There is some notational advantage, however, in choosing a 
d.f. which has all moments finite, but the particular choice of a degenerate d.f. 
has no special merit.) Set F = (F;;). For convenience define J. = ©. Further- 
more, introduce the following notation for moments. 


b,, - | tdFs(t), n= | tdH(t) 


(3.5) s x 
a, - [ (t—b,)?dF,(t), 0 = [ (t — n,)? dH,(t). 
0 0 


3 As a convenient abbreviated notation, stochastic processes will be denoted by the 
letter(s) used to designate the corresponding r.v.’s. 
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The following easily verified consequences of the above definitions will be 
useful in later discussions. 


P[X, S z| Jo, °°: , Ina] = Hy,_,(2), 
PlJn = J | Jo, °°+ » Sua) = Pars 
P(X. S z| Je, +--+, Ja] = Fi,_,3,(2); 
(36) PIX., 5 2,Xun&%,°**; Xun S2%lin.j;2 2G 


= P[X,, S1,°°:,Xn, te|Jo,J1, °°: , dom] 


k 
= [IT Ps, ,-1.24,(20) 


for 0 < m < --+ < m, all equalities holding with probability one. It follows 
from the last two relationships that X,,, Xn,, °:* , Xn, are mutually condi- 
tionally independent given Jn,-1, Jn, ,°** » I nj—1, Jn, (e.g., cf. [6], Definition 3). 

In Renewal theory, the basic process studied is that which gives the number 
of partial sums or renewals in the intervals (0, ¢] for all ¢ = 0. The natural 
analogues to this for the present theory are the counting processes defined now. 

DEFINITION 3.4. The integer-valued stochastic processes {N(t); ¢ 2 0} and 
{N;(t); t = O} are defined by N(t) = sup {n 2 0: S, S t} and N;(i) = no. 
of times J; = j for0 < k < N(t) + 1. 

Notice that without added restrictions on m and/or Q, N(t) may be infinite 
with positive probability. Notice also that the counting functions N ; are defined 
so as not to record the value of Jo. Setting N(t) = (Ni(t), No(t), --- , N;(t), 
--+), the stochastic process {N(t); ¢ = O} is called a Markov Renewal Process 
(M.R.P.) determined by (m, A, Q). Clearly N(t) = 2 os N,(t) as. 

Related to an M.R.P. is the stochastic process defined now which simply 
records the state of the process at each time point. 

DerFIniTIon 3.5. The Z-process, {Z, ; t 2 0} defined by Z; = Jwi is called a 
Semi-Markov Process (S.-M.P.) determined by (m, A, Q). 

Let us introduce some additional vocabulary to facilitate later discussions. We 
shall say that a “transition” of an M.R.P. has occurred at each of the time 
points Sy, S;, S2, --- . The process (either an M.R.P. or an S.-M.P.) is said 
to be “‘in state 2” at time ¢, if, and only if, Z, = 7. 

As defined in Definition 3.4, an M.R.P. is a vector-valued process (infinite 
dimensional if m = «). It is clear that one could construct one-dimensional 
processes that are probabilistically equivalent to the N-process. For example, the 
Y-process defined by Y; = 7 + 1 — 2*** on the set [Juno—n =j,0 Sn <k, 
Jx(y-k # jj and = ~ on the set [V(t) = ~] may be shown to be equivalent to 
the N-process, since it records both the state of the M.R.P. and the number of 
preceding consecutive transitions to state 7 for each t > 0. For most discussions, 
the r.v.’s N;(t), and especially their expectations, play the central role, as does 
N(t) for the special case of a Renewal process, namely the case m = 1. However, 
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the Y-process representation serves to emphasize the relationships between an 
M.R.P. and an 8.-M.P. The Y-process is always an 8.-M.P., and is called the 
associated S.-M .P. of the given M.R.P. It is equal almost surely to the Z-process 
if and only if, p;; = 0 for every state 7 which can be reached with positive proba- 
bility. Otherwise the Y-process always has an infinity of states regardless of the 
value of m. It follows from this Y-process representation of an M.R.P. that, 
theoretically at least, any results about an M.R.P. may be derived from theorems 
concerning S.-M.P.’s. It is, however, both convenient and practical to keep these 
two kinds of processes distinct, and to use the process most natural for a given 
problem. For computation of moments of recurrence times (cf. [7]), it is natural 
to work this out for M.R.P.’s since most applications involve processes in which 
a transition from a state to itself is possible. It should be observed that although 
an M.R.P. has a finite number of states, the associated 8.M.P. will in most ap- 
plications have an infinite number of states, as is the case, for example, for a 
Renewal process (m = 1). On the other hand, for problems concerning the limit- 
ing stationarity of a given M.R.P., transitions from a state to itself play no 
essential role. One may then, without loss of generality, work with the related 
matrix of transition distributions Q* = (Qj,) defined by Q?; = Qi if pax = 1, 
Qi, = 0 if pi; < 1 and Q7; = Qi,{1 — Qi) if i + j and py; < 1. One may 
verify 

Lemma 2. Every 8.-M.P. determined by (m, A, Q) has the same family of joint 
d.f.’s as every S.-M.P. determined by (m, A, Q*). 

Any S.-M.P. determined by (m, A, Q*) is called a corresponding S.-M.P. of 
the given M.R.P. 

When m = 1, a Markov Renewal process becomes a Renewal process, the 
theory of which is extensive (cf. the survey paper on Renewal theory by Smith 
[8]). When the transition distributions are of the form Q;; = p,;;U, for all ¢ and 
j, the Markov Renewal process becomes a Markov Chain, and in this case is 
equivalent to its corresponding 8.-M.P. by virtue of the constant transition 
times. Moreover, a continuous parameter Markov process with m states, all of 
which are stable, is a special case of an M.R.P. (in fact, of an 8.-M.P.) for which 
the Q;; are of the form 


(3.7) Qi;(t) = pi; max (0, 1 — e**) (—-2x <t< @) 
for constants A; > 0, and p,;; = 0 for every 7. 


4. Finiteness of N(¢) and regularity. It may easily be deduced from the con- 
structive definitions of an M.R.P. and an 8.-M.P. given in Section 3 that they 
are separable and that almost all sample functions of the Y-process, and of the 
Z-process, are step-functions over an interval of the form [0, L) and identically 
equal to infinity over [L, ©), where L > 0 is a possibly infinite r.v. which is a 
Borel function of the Y-process. Clearly, the sets [L < «] and [N(t) = ~, 
t = L) differ only by a set of measure zero. It is important to be able to char- 
acterize those M.R.P.’s for which L = «, or equivalently, those for which 
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N(t) < @ for all t. We shall first verify the intuitive result that in the case of 
m < , the (a.s.) finiteness of N(t) is always true. 
Lemma 4.1. Jf m < ~, then for all states 2, 


(4.1) P[N(t) < », forall t= 0] = 


Proor. By Definition 3.4, N(t) is nondecreasing. It suffices, therefore, to 
prove that P[N(t) < ~« | Z) = i] = 1 for each ¢ 2 O, and for every 7 for which 
a; > 0. Suppose a; > 0. By Definition 3.4, (3.3) and (3.4) one obtains for 
t=0 


P(N(t) 2 n|\Z. = i] = PIS, S t\Jo = 1] 


Zz a" a 


$n,i J=0 


2d \J=0 j= 


n—1 
> {i a * Pr aja;4,(t) 
) 
where * denotes convolution of the indicated d.f.’s and where 


(4.2) Sai ={(ao,a1,°**,an):a0o=%,a; aninteger, 1 S aj S m(1 Sj S n)} 


is the set of all paths of length n + 1 of the J-process for which Jo = 7. Define 
F = max; ; F;; . It is well known that for mass functions F; , F2 , G; and G: for 
which F, S G, and F; S G2, one has F,#F, S G,*G, . Consequently, it follows 
from (3.6) that 


n—1 


P(N (t) |Z = i) s F(t) > I Deje;,, = F(t). 
$n,i J=0 
Since m < o, one has by Definition 3.1 that F(0) = 0 and so fort > 0, 
F(t) > 1 as n> +0. 
Any df. F satisfying F(0) = 0 which is an upper bound for every F;; , would 
have sufficed in the proof of Lemma 4.1. An alternative choice of F which has a 
more intuitive interpretation than that used in the above proof is 


F=1-][ fl — Fal, 


the d.f. of the minimum of a family of independent r.v.’s, one corresponding to 
each d.f. Fy; . 

A consequence of Lemma 4.1 is that almost all path functions of a Y-process 
with m < © are step-functions over [0, ~ ), as is also true for the corresponding 
S.-M.P. 

Consider now the case of unrestricted m. For this case, it is necessary to im- 
pose restrictions in order to insure the (a.s.) finiteness of N(t). To see this, the 
simplest example is the degenerate one for which Q; j4: = U,-;(-)(j 2 1) and 


all other Q;; = 0. For this exampie, N(t) = n, whenever 1 — 2" St < 1 — 


- 


2-""" for n = 0, while N(t) = ~, whenever t = 1. That is, L = 1(a.s.). In 
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what follows, a necessary and sufficient condition for the (a.s.) finiteness of N(t) 
for every ¢ 2 0 is given, as well as several sufficient conditions which are ap- 
plicable in the more common situations. 

For any c > 0, define the truncated moments b{S’ = f(tdF,,(t). Clearly 
b:; = lim... $5’. Define the family of integer sequences 


8: = {(ao,a,°*:)iaa=t1Sa;j<m+1Z7 2 1)}. 


DEFINITION 4.1. A state ¢ of an M.R.P. determined by (m, A, Q) is said to 
be regular-A if either P[(Jo, Ji, ---) € 8:] = a; = 0, or a; > O and there exists 
a measurable subset @ C §; such that P[(Jo, Ji, ---) € @| Jo = i] = 1 and 
such that for every (ao, a, ---) € @ and for every c > 0 at least one of the 
series 


x 20 
(4.3) Li <P tek Ea 
j=0 j=0 

diverges. An M.R.P. determined by (m, A, Q) is said to be regular-A if each of 
its states is regular-A. If these properties hold for all initial distributions A, 
the state or the M.R.P. will be called regular. Since whether an M.R.P. is regular 
or not depends only upon the nature of Q, we shall alternatively speak of Q as 
being regular. 

It would have sufficed in the above definition to have required the divergence 
of one of the series in (4.3) for only those sequences in @ for which paja;,, > 0 
for every j. This is so because of the convention made earlier, that whenever 
pi; = 0, Fi; = U; and hence b}$’ = 1 > 0 for all c = 1. It is shown in the fol- 
lowing theorems that the concepts of the above definition may be used to char- 
acterize the (a.s.) finiteness of N(t). 

THEOREM 4.1. For any given state i of an M.R.P. determined by (m, A, Q), 


(4.4) P{Jo = i, N(t) = © forsome t= 0) =0 


if and only if i is regular-A. 
Proor. The theorem is obvious whenever a; = 0. Assume, therefore, that 
a; > 0. For any (ao, a, --*) € 8; , one can show that 


P(N (t) = ® | Ji = Qk; k = 0) = lim P(S, =< t| J; = Qk; k = 0} 


n-o 


x 


(4.5) * Paya 


2%7 +1 


(t) 


j=0 


by the last relationship of (3.6). It is known, and easily verified, that for non- 
negative r.v.’s Kolmogorov’s Three-Series criterion (cf. [9], p. 236) for a.s. con- 
vergence of a series of independent r.v.’s, becomes a “two-series”’ criterion, 
namely, “If {V, :n = 1} is a sequence of nonnegative independent r.v.’s, then 
the series vos V, < & (a.s.) if, and only if, for some finite c > 0, 


x 


> PIV. >c] < ~, and >> E[min(V,,c)] < =. 


n=l n=l 
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Furthermore, if the series does not converge a.s. then it diverges a.s.”” Now (4.5) 
implies that with respect to the indicated conditional probability measure, the 
S,’s are representable as partial sums of independent nonnegative r.v.’s. There- 
fore, by the above version of Kolmogorov’s theorem, one has that 


(4.6) * Foia;,,(t) = 0 (t 2 0) 
j=0 
if, and only if, for all c > 0, at least one of the series 


> PIX. >c|Je = ax,k 20), > Efmin(X,,c)| J: = a,k = 1] 
n=l 


n=) 
diverges, which is easily checked to be equivalent to specifying that at least one 
of the series given in (4.3) diverges. Therefore, if state 7 (a; > 0) is regular-A, 
there exists a set @ C §, of conditional probability equal to one, such that for 
every (ao, a, °°:) € @, (4.6) is satisfied, and hence by (4.5) 
P(N(t) = «| Jo = i] = 0, 
thus verifying (4.4). Conversely, if (4.4) is satisfied, then 


0 = PIN(t) = ~|Jo =i] = el « Fy.,,,,(t) | Jo = il, 
n=( 


Because of the nonnegativeness of the integrand, this implies 


P| « F,3,.,(t) =0| Jo = i| = 1. 


n=( 


Consequently, in Definition 4.1, one may choose @ C §; to be the set of all 
a-sequences satisfying (4.6). Hence, state 7 is regular-A. 
Coro.uary 4.1. For an M.R.P. determined by (m, A, Q), 


P[N(t) < © forall t] =1 


if, and only if, it is regular-A. 
Coro.uarY 4.2. For a given m and Q, 


P[N(t) < ~ forall t)}=1 


for all choices of a vector of initial probabilities if, and only if, Q is regular. 

The foregoing theorems give a complete characterization of those M.R.P.’s 
having almost all sample functions equal to step-functions over (0, ~). In 
many practical situations, due to additional assumptions being stated, it is not 
necessary to check completely the conditions for regularity as given in Definition 
4.1. In many instances, weaker sufficient conditions are available, and possibly 
are more easily checked. Some of these are given in the following discussion. 

The simplest sufficient condition is a consequence of Lemma 4.1 namely that 
if m < «, then the M.R.P. is regular. 

It is also easily shown that if for each a-sequence in a subset @ C §; of (con- 
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ditional) probability one, there exists a finite M > 0, such that Fa;«;,,(M@) = 0, 
then state 7 is regular. A particular application of this is to Markov Chains over 
discrete time, of which all states must, therefore, be regular. 

If >of0 baja;,, < © On some set of a-sequences in 8; of positive (conditional ) 
measure, then 7 is not a regular state. This is a simple consequence of the fact 
that the convergence of the series of expectations of a sequence of independent 
r.v.’s, implies the convergence (a.s.) of the series of r.v.’s. 

If for every a-sequence in a subset @ C §; of (conditional) probability one, 
either >> Zo Ou;a;,, < ©, or there exists a finite M > 0 (possibly depending on 
the sequence) such that F aja;,,(M) = 1 (j 2 0), then state 7 is regular if and 
only if >>%5 «© for every (ao, a ,-°-:) €@. This again is a simple 
consequence of known adil on sums of (positive) independent r.v.’s (ef. [9], 
p. 236). 

As a special case of the above, consider the following sufficient condition. If 
for every a-sequence in a subset @ C §; of (conditional) probability one, either 

jx0 Caja;,, < ®, Or there exists a finite M > 0 (possibly depending on the 
sequence) such that Fa;a;,,(47) = 1 (j 2 0), and there exist two real sequences 
{8;}, {nj} of positive numbers such that a. 5jn; = © and Foa;a;,,(6;) S 1 — 0; 
(j 2 0) for every (ao, a, ---) € @, then state 7 is regular. This follows immedi- 
ately from the preceding Seas under these conditions b.;«;,, 2 5jn; . This con- 
dition is a corrected version of one due to Smith [2] (see also [10]). (When reading 
[2], the reader should note the different meaning of the word regular as it is used 
there. ) 

Consider now a condition designed primarily for continuous parameter Markov 
processes with an at most countable number of states. If there exists a set 
{Xxj;: 1 S i,7 < m-+ 1} of finite positive numbers such that for every a-sequence 
in a subset @ C §; of probability one, Faj;a;,,(t) = 1 — exp (—Aa;a;,,t) for all 
t = 0, then state 7 is regular if, and only if, 


(4.7) 1 i li 
0 


j= 


for all (ao , a, , --:) € @. That regularity implies (4.7) is immediate. It suffices to 
show that (4.7) implies the divergence of one of the series in (4.3). This is best 
seen by simply evaluating the series (4.3) to be 


>> exp (—CAe;e;4:)> 


j=0 
> {Najajqill — exp (—Crasa;41)] — Cexp (—CAaja;,1)}- 
Assuming (4.7), one has that if the first series converges, then for 7 sufficiently 


large 1 — exp (—CXa;a;,,;) > %, and so the second series must diverge. For 
ij = Ax, One obtains the known result for Markov processes, and if, moreover, 
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one has p;;41 = 1, the above is the well known result for pure Birth processes 
(cf. [11] p. 349, [5] p. 271 and [12] p. 406). 

Lastly, we mention a sufficient condition which is in terms of the underlying 
Markov Chain. If state 7, considered as a state of the J-process, a Markov Chain 
by Lemma 3.1, is such that with (conditional) probability one, the /-process, 
starting in state 7, will reach a recurrent state (cf. [12] p. 353), then state 7 is 
regular. To see this let 7 (possibly equal to 7) be the first recurrent state that is 
reached, and suppose it was reached at the noth transition (m) may be zero). 
Let nm; , m2, -+- be the successive integers n at which J, = 7. Set To = S,, and 
T; = Sn, — Sn,_, . By assumption, the r.v. nm, are finite a.s., and hence, so are 
the 7',’s. Furthermore, | 7; :k = 1} forms a Renewal process and SO) kat T,=@ 
(a.s.). Since >>%_,X, = Dofo T; (a.s.), the former series diverges and hence 
N(t) < « (a.s.) for every t, as required. 


5. Classification of states. In this section, the states of an M.R.P. will be 
classified in much the same manner as is done for Markov Chains, the terminology 
of the latter being retained. The reader is referred to Feller [12] and to Chung 
[13] for material on Markov Chains. To facilitate the definitions to follow, it is 
assumed that for all M.R.P.’s considered below, every initial probability is 
positive. Consider the notation defined, for all 7, 7 and t = 0, by 
(PZ, = j | Zo = | 

0 
| P(N ;(t) > 0| Zo = 2] if t20 


0 if «<0 


(5.1) P;;(t) 


(5.2) G;;(t) 


and let the moments (possibly infinite) of G;; be denoted by u;; . According to 
these definitions, P;;(t) is the probability that an M.R.P., initially in state 7, is 
in state 7 at time ¢, while G;; is a mass function representing the probability 
distribution of the time (first passage time) until the next transition into state 
j of a process which is initially in state 7. Notice that when i = j, this definition 
does not require the process to leave state i during the first passage time. u;; will 
be called the mean recurrence time of state 7. 


DEFINITION 5.1. (a) States 7 and j are said to communicate if, and only if, 
either G,;(% )G;,( 2) > Oorit = 7. 

(b) Communication is an equivalence relation, and the disjoint equivalence 
classes are called classes and are denoted by C; (whenever 7 ¢ C;). 

(c) An M.R.P. is said to be irreducible if, and only if, there is only one class. 

(d) A class C; is said to be essential (or closed) if, and only if, for all 7 ¢ C; and 
for all ¢ = 0, Dokec, P(t) = 1. 

(e) State z is said to be recurrent (or persistent (Smith [8]) or ergodic (Lévy 
[11])) if and only if G,,( « ) = 1, and is said to be transient otherwise. 

(f) State 7 is said to be null recurrent (weakly ergodic (Lévy [11]) if, and only 
if, it is recurrent and y;; = ~©. State 7 is said to be positive (ergodic (Feller [{12]), 
strongly ergodic (Lévy [11])) if, and only if, it is recurrent and p;; < ~. 
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As should be expected, the properties defined here for M.R.P.’s are very 
closely related to those of the corresponding Markov Chains (c.M.C.) deter- 
mined by the same m, A, and P, namely, the J-processes. This is illustrated by 
the following theorem. 

TuHEoREM 5.1. For a given M.R.P.: (a) state i is recurrent (is transient) |com- 
municates with state j|, if and only tf state i is recurrent (is transient) [communicates 
with state j| in the c.M.C.; (b) a class is essential if and only if in the c.M.C. it is 
essential; and (c) if m < @, then state i is positive if and only if state i, in the 
c.M.C., is positive and nj < ~ foralljeC;. 

Proor: (a) From (5.2), it follows that 


(5.3) Gi(2) = PlJ, =j forsome n> O| Jo = ij. 


This relation suffices to verify (a) since the properties of recurrence, transience 
and communication involve only the quantities G;;( ), and since (5.3) shows 
that these quantities are identical to the analogous quantities of the c.M.C. 
(b) is immediate. To prove (c), let $,,,;,; denote the subset of (ao, +++ , an) € $n. 
for which a, = j, where §,,; is as defined in (4.2). Then one may straightforwardly 
show (for arbitrary m) that 


n—1 


(5.4) hij = > Zs i a oe 5 saci +B an _1a9) 


n=1 $n.i,j k=0 


Now assuming m < ~« andi = j, one can write 


. a * 
(ming, jec, bij) Mri S wis S (MAXs, jec, Des) wii 
where 


n—1 


x 
ui = > n Ze | 


n=1  §n.i.j =O 


is the mean recurrence time of state 7 in the c.M.C. Since m < © the mini- 
mum shown above is positive and since m < « and 9; < ~ foralljeC;, the 
maximum shown above is finite. 

There does not seem to be any simple necessary and sufficient condition for a 
positive state in the case of m = «. Examples may readily be constructed to 
show that a state of an M.R.P. may be positive (null recurrent), while the same 
state in the c.M.C. is null recurrent (positive). One sufficient condition for the 
positivity of state 7 is that the state be positive in the c.M.P. and > jec, nj < ©. 
The proof of this, as well as further discussion along these lines, is contained in 
[7] where explicit computations of the u,;; in terms of the u?; is made. 
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MARKOV RENEWAL PROCESSES WITH FINITELY MANY 
STATES! 


By Ronautp Pyke 
University of Washington? 


1. Summary. In this paper, Markov Renewal processes having a finite number 
of states are studied. Explicit expressions are derived for the distribution func- 
tions of first passage times, as well as for the marginal distribution function of 
the corresponding Semi-Markov process. Double generating functions are ob- 
tained for the distribution functions of the Nj-processes. The limiting behavior 
of a Markov Renewal process is discussed, the stationary probabilities being 
derived completely. General Markov Renewal processes are introduced, and 
a related stationary process is determined. Several examples are given. 


2. Introduction. In [1], a class of stochastic processes, called Markov Re- 
newal processes (M.R.P.), are defined and a preliminary investigation is made 
of their structure, and of the related Semi-Markov processes (S.-M.P.) intro- 
duced by Lévy [2], Smith [3] and Takdcs [4] independently in 1954. The reader 
is referred to [1] for the necessary definitions and notation. Roughly speaking, 
M.R.P.’s are generalizations both of continuous and discrete parameter Markov 
Chains which permit arbitrary distribution functions (d.f.), possibly depending 
both on the last state entered and on the next state to be entered, for the times 
between successive transitions. 

In the present paper we restrict our attention to those M.R.P.’s determined by 
(m, A, Q) with m < o. Recall, that because of Lemma 1.4.1, all such M.R.P.’s 
are regular, i.e., almost all sample functions are finite-valued step functions over 
(— «,«). In Section 3, systems of integral equations are given for the functions 
P;;(t) and G;;(t), respectively, the d.f. of Z, and the d.f. of the time until the 
first transition into state 7, both given Z) = 7. Equations relating these two 
functions are also given. Lemma 3.2 is due to Takdécs (equations (10) and (11) 
of [4]),* while (3.2), summed over i with respect to the initial probabilities, is 
essentially equation (8) in a paper by Weiss [5], who derived this and other 
integral relationships for the purpose of studying the asymptotic behavior of 

Received January 18, 1960; revised May 26, 1961. 

1 This research was supported by the Office of Naval Research under Contract Number 
Nonr-266(59), Project Number 042-205. Reproduction in whole or in part is permitted for 
any purpose of the United States Government. 

2 This work was completed while the author was at Columbia University. The contents 
of the first 6 sections comprised an Invited paper given at the Annual meeting of the I.M.S., 
Sept. 1958, Cambridge, Massachusetts. 

3 All numbers which are prefixed by I, refer to the correspondingly numbered part of [1]. 

4 In assumption 4 of [4], which postulates that the X-process is a sequence of independent 
random variables, the word independent should be interpreted as conditional independence 
given the successive states of the system. 
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the ‘“Renewal functions” M ;;(t), defined in (5.10). The relationships of Section 
3 are solved in Section 4, with explicit expressions being given, respectively, in 
Theorems 4.1 and 4.2 for the matrix-valued functions ® and G defined therein. 
In Section 5, the marginal d.f.’s for the Nj-processes are studied, explicit ex- 
pressions being given for their double generating functions. As a consequence, 
the matrix-valued Renewal function 9M is derived. The three most common 
subclasses of M.R.P.’s are briefly discussed in the following section; they are 
Markov Chains, continuous parameter Markov processes, and Renewal processes. 

In Section 7, the problem of characterizing the limiting behavior of an 
M.R.P. is solved. The stationary distribution of the process is derived, thus 
extending a result of Smith [3]. The concept of a general Markov Renewal 
process (G.M.R.P.) is introduced, in which a different matrix of transition 
distributions may be used to determine the initial transition of the process then 
is used for all remaining transitions. By using the stationary probabilities ob- 
tained for a given M.R.P. it is possible to construct a related G.M.R.P. which is 
“stationary” in a sense made explicit in Theorem 7.2. Following this, some addi- 
tional examples are briefly stated in the final section. 


3. The probabilities P;; and G;; and related quantities. Of considerable im- 
portance in studying the behavior of M.R.P.’s are the times between successive 
occurrences of a given state. It is the purpose of this section to study the d.f.’s 
of these times as well as the marginal d.f.’s of the corresponding 8.-M.P., the 
Z-process, which is to say, the probability of the process being in a given state 
at any instant of time. Several identities and relationships between these quan- 


tities will be given for them in terms of the basic Q-matrix. 

Let us assume that A > 0 coordinate-wise for every M.R.P. under considera- 
tion, in order that all conditional probabilities will be well defined. The quantities 
in question are then defined for all 7, 7 (ef., Section 1.5) by 


(3.1) G;;(t) = PIN ;(t) > 0 | Zo — 1, P;;(t) — P{Z, - j Zo — | 


for t 2 0, and denote their respective L.-S. transforms (when they exist) by 
gi; and x;;. Let bj; , wi; and 7; be the first moments (possibly infinite) of the 
mass functions F;; , G;; and H; respectively. Notice that for i = j, Gi; becomes 
the mass function (possibly of total variation less than one) representing the 
probabilistic behavior of the first passage time of the process from state 7 into 
state 7, but that the process need not have left state ¢ during this recurrence 
time. 

The two quantities defined by (3.1) are closely related both to each other 
and to the elements of the basic Q-matrix of the M.R.P. being studied. In the 
next three lemmas these relationships are explicitly stated. First of all, the 
relationship between the P;;’s and the Q;;’s is given in 

LemMa 3.1. For allt => 0,s > 0 


m 


(3.2) Pi(t) = 8; — >> [8:5 — Prj(t)]*Qu(t), 


=1 
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m 


wij(s) = 6; — 7. [6;; — mej(8)|qix(s). 
k=1 
[Throughout this paper, 6;; is used to indicate both the Kronecker delta and the 
function 5;;Uo(-), where U, is the d.f. with unit jump at c, whose domain (pos- 
sibly restricted, as it is in (3.3)) is determined by the context. A similar remark 
applies to any real constant.| 
Proor. Define for all t = 0, n = O, 


(3.4) P(t; n) = P[Z, = j, N(t) = n| Z = 1). 


By Lemma I.4.1., N(t) < © as. since m < «,andsoP;;(t) = 2 Yoo Pi;(t;n). 
The quantities P;;(t; n) are straightforwardly shown to satisfy P;;(t; 0) 
= 6,,{1 — H,(t)| and, for n > 0, 


(3.5) P3(t;n) = > P,j(t;n — 1) *Qy(t). 


k=1 
Upon summing over n in these last two equations, (3.2) is obtained immediately. 
(3.3) then follows upon taking L.-S. transforms of the former. 
The analogous relationship between the G;;’s and the Q;;’s is given without 
proof in 
Lemma 3.2. Fort = 0,s > 0, 


(3.6) Gi(t) = >> Gj(t) * Qa(t) + [1 — G;,(t)] # Q.;(t) 


k=1 


(3.7) gis(s) = >> ges(s)qu(s) + [1 — gjs(s)]qis(s). 


k=1 


Between the G;;’s and the P;,;’s, there is a particularly tractable relationship, 
as given in 
THEOREM 3.1. Fort = 0,8 > 0, 


(3.8) Pi(t) = Pj,(t) *Gi,(t) + 6{1 — H(t), 


1 — h;(s) 
1 — gj;(8) 

Proor. It suffices to rernark roughly that for i  j, P;;(t) is the probability 
of reaching state j for the first time before ¢ (according to G;;) and then, in the 
remaining time, ending up in state j (according to P;;). In case i = 7, one must 
add to the above the probability that no transition occurs in (0, ¢], namely, 


1 — H,(t). (3.9) is an immediate consequence of (3.8). 
Define 


(3.9) mij(8) = 1;;(8)gi;(s), (i # j); n;(8) = 


(3.10) Dt) =1— PiZ. =1,0Su<t|Z = i. 


Then D; is a mass function representing the duration of time that the Z-process 
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remains in state 7. It is clear that D; = H; in case p;; = 0. In particular, D; = H; 
for an S.-M.P. One may easily prove 
LemMa 3.3. Fort 2 0,s > 0, 


Dt) = [Hi(t) — Qi(t) {1 — Qi:(t)]~ 


(3.11 
; dis) = [hi(s) — qis(s)][1 — gi(s)]”. 


Notice that D; must have total variation equal to either zero or one, and that 
it equals zero if and only if p;; = 1. In this case one would say that state 7 is 
an absorbing state. 

The explicit evaluation of P;; in terms of H; and G;; determined by (3.9) 
may be used to characterize a recurrent state in terms of the integrability of 
P;; . The reader is referred to Section 1.5 for the definition of a recurrent state 
in an M.R.P. 

THEOREM 3.2. If nj < ©, then state j is recurrent tf and only if 


(3.12) I P;;(t) dt = ~, 
0 
Proor. By its definition in (3.1) one has j 


(3.13) lim s'x,,(s) = lim [ e“P,,(t) dt = [ P(t) at 
s 0 a0 0 0 
as a consequence of the Lebesgue Monotone Convergence Theorem, whether 
or not the limit is finite. From Theorem 3.1, one obtains, when 9; < ~, 
aiid . L—Aj(s 
(3.14) lim s‘#;;(s) = lim te 


30 30 


[1 — gis(s)I* = nfl — Gjj( eT. 


Therefore (3.13) and (3.14) together imply that state j is recurrent (i.e., G;;( © ) 
= 1) if and only if (3.12) holds, as required. 

Note that because of Theorem I.5.1(a), it follows from the above theorem, 
that if 7; < ~«, then 


I P,(t) dt< © ifandonlyif >> ps? < 
0 n=( 


where P” = (p{?’). It should also be emphasized that in the proof of Theorem 


3.2, the following stronger result is obtained, namely, that if »; < 


eo —1 
G.iie) =i nf P(t) at 
0 


4. The matrices & and G. In this section, the relationships given in Lemmas 
3.1 and 3.2 will be solved, with explicit expressions for the P;; and the G;; in 
terms of the Q;; being given in matrix notation. Theoretically, therefore, for 
any M.R.P., the G;; and the P;; may be uniquely determined from the Q,; . 
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Consider the matrix-valued functions defined by 


e = (P,;), m = (1;;) 
S = (G;), : = (gij) 
RX (6:;H,), = (8; ;h:;). 


Define a convolution operation on matrix-valued functions (whenever the defini- 
tion is valid) by K* & = ()-%,Kau*I,;) which is formally the same as 
regular matrix multiplication except that the usual numerical product is re- 
placed with convolution. Let I denote either the identity matrix (6,;;) or the 
unit-step matrix-valued function (6;;Uo(-)). The domains of the various func- 
tions will be clear from their contexts. For an arbitrary matrix-valued function x, 
set kK” = Ianddefine x” = x” «Kforn > Oand(I— x) = >>%_, x” 
whenever the series converges. An explicit expression for @() in terms of Q and 
3¢(4 and &) is given in 
THEOREM 4.1. 


(4.1) e=(I-—Q)"*«(I-K), 2#= (I-4)"(I-&), 


the latter equation being defined over (0, ©), while x(0) = 0. 

Proor. It suffices to show that (P;;(-;n) = Q°” * (I — 3¢) where P,;(-; n) 
is as defined by (3.4), since then in view of the fact that N(t) < © as., (4:1) 
follows immediately upon summation over n. It is easily checked recursively 
that Pi;(-;) * Qe = Qu * Pi;(-; n). It therefore follows that (P;;(-;0)) = 
I — & and (Pi;(-; n)) = Q * (Pij(-; n — 1)), (n > 0). Consequently, one 
obtains 


(4.2) ° = > e™” «(I — 3), 
n=0 


which immediately yields 
(4.3) x(s) = 2, [4(s)]" — &(s)), 
n=0 


for s > 0. To verify (4.1), let s be a fixed positive number and define c, = 
max;,; 9:i;($). Since by (i) of Definition 1.3.1. each mass function Q;; satisfies 
Q:;(0) = 0, and since m < ~, one hasc, < 1. Therefore O <= 4"(s) S c1 where 
the inequality is termwise and where 1 denotes the m X m matrix in which 
each term equals 1. Consequently the series in (4.3) converges. Moreover 
(I — 4(s)]>.2~0 9"(s) = I, and hence the inverse (I — 4)" = >> 9” is well 
defined for all s > 0. That «(0) = I follows directly from the definition, (3.1). 

It should be remarked here that equation (3.3) of Lemma 3.1 may be re- 
written in matrix notation as x = I — & + 4x. Equation (4.1) of the above 
theorem may then be considered as an immediate consequence of Lemma 3.1, 
once the non-singularity of I — 4 is demonstrated. 

For any m X m matrix (or matrix-valued function) A = (a;;), define the 
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diagonal and off-diagonal parts of A by 
aA = (6;;0:;), oA = A = aA. 


With this notation one may rewrite (3.7) of Lemma 3.2 and (3.9) of Theorem 
3.1, respectively, as 


(4.4) 9 = 4(1 + o) 


and 
(4.5) x = (I+ )(I — a) (I — &). 


Equivalently, because of (3.9), (4.5) may be rewritten as x(¢x)~' = I + of since 
for s > 0, x;;(s) > 0 for all 7. Consequently, substitution of this in (4.4) leads, 
as a result of Theorem 4.1, to the proof of 

THEOREM 4.2. As defined on (0, ~) 


(4.6) 9 = In(ax) = 41 — 4) 4 {(I — 4). 


An immediate consequence of Theorem 4.2 is that a formula can be given for 
the mean recurrence times y;; as defined after (3.1). Set w = (u:;). Clearly 
u = lim,.os (1 — 9), and so from (4.6) it can be shown that 


(4.7) av = lim,o {als(I — 4)" 
where one must interpret 1/0 = «. 


5. The probability distribution of N;(t). It is possible to obtain a system 
of integral equations for the probability distributions of the r.v.’s N;(i)(1 Sj 
< m). Explicit solutions of these equations are derived in terms of double gen- 
erating functions of these probabilities. Theoretically, therefore, the probabilities 
are determined and, in particular, the moments of N;(t) may then be obtained 
in the usual way. 

Define for all 1 S i,j S m,k = 0,8,t 2 O,and}|z)| <1, 


v5;(k; t) = P[N;(t) = k | Zo il, ij\ 2; > 2 vi; (k; t), 


(5.1) x 
Wij(z; 8) = [ e di bis(z; t), 


“0 
and V, = (vi;(k; -)), W. = (Wij(z; -)). 
It is immediately seen that the v,;; can be expressed either in terms of the Q;,’s 
or in terms of the G;; by means of the following integro-difference equations. 
These expressions are self explanatory and require no proofs. For t = 0 


vis(k; t) = >> v,;(k; t) * Qi(t) + 0;;(k — 1; 8) #Qi,;(t), (k > 0), 
rj 


vi(0;t) = > »,;(0; t) #Qi(t) + 1 — Hilt) 
rj 
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and 

(5.3) vis(k; t) = v5j(k — 1, t) # Gi;(t), v:,(0;t) = 1 — Gi,(t), (k > 0). 
The solutions to (5.3) are easily seen to be 

vis(k; t) = Gis(t) # GS” (t) « [1 — G;,(2)), (k > 0), 
v;;(0; t) 1 — G;;(t). 


(5.4) 


Actually (5.4) may be viewed as a known result in Renewal Theory. By (5.4), 
the v;; are explicitly expressed in terms of the G;; , which in turn have been ex- 
pressed in Theorem 4.2 in terms of the basic Q;;’s. This relationship may be 
more simply expressed by means of generating functions. Thus from (5.4), one 
obtains for | z| < 1, 


bij(2; t) = 1 — Gi(t) + 2Gi,(t) [1 — 2G,,(t)!™ *[1 — G,,(t)], 


and so for s = 0 


Vij(z;8) = 1 — gis(s) + 2gi;(s) [1 — gjs(s)][1 — 293;(8)] 


From the last expression, one obtains 
THEOREM 5.1. As defined over (0, ©) for |z| S 1, one has 


(5.5) w,=1- (1 —z)s(I— 249)". 
Substitution for g in terms of ¢ may be made in (5.5) by Theorem 4.2, thus 
giving an explicit expression for W in terms of 4q (cf., Corollary 5.1 below). 
Notice that Wo = 1 — g and W, = 1, as required. 

An alternative derivation of (5.5) is to bypass the explicit expression (5.4) 
and to derive the analogus equations to (5.3) for the generating functions. By 
so doing, one would obtain directly the matrix equation 


(5.6) W,—2a¥,=1- 4. 


Now for any -quare matrices B, C, D for which ¢;; # 1 for every 1, the solution 
for B in the equation B — C zB = D is easily checked to be 


(5.7) B=D+C(I— Cc)" D, 


since the given equation implies that (I — ¢C) «zB = 4D, and the assumption 
permits the taking of the inverse of I — ¢C. Theorem 5.1 may then be obtained 
by applying (5.7) to (5.6). 

In terms of generating functions the equation (5.2) becomes 


¥is(2; 8) = Dd qir(8)¥rj(2; 8) — (1 — z)qis(s)¥ii(2; 8) + 1 — hi(s), 
r=] 


or in matrix notation 


(5.8) (I — 9)¥,+ (1 — z)4 WW. = (I — 9)1. 
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As in the proof of Theorem 4.1, (I — 4) is non-singular. Therefore, (5.8) may 
be rewritten as 


w, + (1 — z)(I — 4)"4 4, = 1, 


whose solution, following (5.7), is given in 
Coro.uary 5.1. As defined over (0, ©) for |z| S 1, one has 


(5.9) W,=1— (1 —2)(I— 4)“ + (1 — 2) (I — 9) HY". 


This is indeed a corollary of Theorem 5.1, since, as mentioned earlier, it can be 
obtained from (5.5) by a substitution of (4.6). However, the above more direct 
approach is of interest in its own right. 

Although the explicit result given in (5.9) is somewhat complex in appearance, 
it should be observed that it implies that when computing &, from 4, only one 
major computation must be made, namely the inversion of (I — 4). This remark 
applies also to the results of the preceding section. The reader should also note 
that Theorem 4.2 may be viewed as a corollary to (5.9), since by detinition 
1 — Wo = 4. 

Clearly the moments of N;(t), or more precisely, the L.-S. transforms of 
these moments, may be obtained by successive differentiations of W, . In particu- 
lar, the L.-S. transform of the expectation of N;(t) is readily obtained from 
(5.9) because of the special form of &, . Define fort 2 0,s > 0 


(5.10)  May(t) = EIN,(t)|Z =a, mag(s) = [ e dM, (t), 
0 


and set 9 = (M;;(-)), ™ = (mi;(-)). We shall call 9M the Renewal function 
of the process. Clearly 


m= (z—1)"(W,-—1)| 0. 
From (5.5) and (5.9), one then obtains, respectively, 


m = g(I — ag)" = 4(I — 4)" 


, 
thus proving 


THEeoreM 5.2. For a M.R.P. with m < @, the (conditional) expectations of 
N(t) satisfy 


m= Q*e(I-—Q)” = (I-Q)” - I on [0, ~) 


m = (I — 4)" = (I-— 4)" -I on (0, «). 


This is, in several respects, a very important result, if not at first sight amazing. 
First of all, it implies that a knowledge of 9M is equivalent to a knowledge of Q. 
That is, an M.R.P. is equally as well determined by (m, A, S11) as by (m, A, Q). 
Because of known results in Renewal theory, the moments of the G;; may be 
determined by a knowledge of the asymptotic behavior of the M;; . In the next 
section we shall briefly consider some special cases of M.R.P.’s, one of which is 
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the case of a Renewal process (the case of m = 1). The striking similarity be- 
tween (5.11) and (5.12) and the corresponding known results for Renewal 
processes will then become apparent. It seems to this author that the suitability 
of the name Markov Renewal processes for the stochastic processes being studied 
in these papers, is best supported by this similarity, together with the ease with 


which Renewal theory yields limiting results for these processes, as is demon- 
strated in Section 7. 


Theorem 3.2 together with (5.12) shows that one may write 


(5.13) M=FGaMm+1), Fg=mm+™ 
which implies M;; = Gi; * Mj; + Gi;. 

Because of the basic nature of Theorem 5.2, it is desirable to consider the fol- 
lowing more direct proof which affords a much clearer insight into this relation- 


ship between SM and Q, making it intuitive and natural, rather than “amazing’’. 
From the definition of an M.R.P. (in particular, I.(3.6)) it follows that 


n—1 
(5.14) PlJ. = §, 82 St|Zo= = D eQares,s(t) 


Sn.i.j b= 
Hence, either by induction or by recalling the interpretation « 
P” in Markov Chain theory, one obtains 
(5.15) (PiJ, = j, Sa St| Zo = i]) = Q™ = (.%! 
Furthermore, define for each J, r.v.’s {Un ;n 2 1} by Un = Luv, = J and 


S, S t, and = 0 otherwise. Clearly N;(t) nxt Un.j. Therefore, since 
(E[U,,; | Zo = i]) = Q°”, one obtains 


(5.16) m = (E[N;(t)| Z = i) = oe” = (I-— 9) - I 


n=l 


as desired. 


6. Special cases of Markov Renewal processes. 
(a) Markov Chains. As has been mentioned earlier, an M.R.P. becomes a 
Markov Chain whenever F;; = U,(-) for all 7, 7. Hence for a Markov Chain, 


fils) = €", qis(8) = pie, hs) =e", 
f =e ‘1, q1=e P, & =e I. 
Consider first of all the relationship (3.12) which becomes 
1—Aj(s) 1—e”* 
1 — gj(s) 1 — gis(8)” 
Set z = e *. For purposes of this paragraph alone, introduce the notation 


oO 


F(z) = g3;(—logz) = > 2"f;(n) 


n=l 


(6.1) 13j(8) = 


U;(z) = (1 — 2) *9;;(—logz) = Dd 2"Pii(n) 
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where f;(n) = G;;(n) — G;;(n — 1) is the probability that state 7 is reentered 
for the first time at time n. Upon rewriting (6.1) with this notation, one obtains 
U,(z) = {1 — F,(z)|", which is a very well known relationship for Markov 
Chains (for example, see Feller [6] pp. 285, 352). 

Consider now equation (4.1). For a Markov Chain it becomes 


e = > U,(-)P*«[l — Ui(-)], a(s) = (I—eP) “(1 —e’). 


k=0 


In particular, one obtains the known result that 


% 


@(n) = >> [U.(n) — Uraa(n)|* = P". 


k=0 
Moreover, equation (4.6) of Theorem 4.2 becomes 
9(s) = e "P(I — e P) “{a{(I — cP) 
while (4.7) becomes 


> : 8 1 1 
(6.2) ; aw = lim,o {als(I — e “P) }} ©. 


By a straightforward generalization of well known Abelian and Tauberian 
theorems for series, one obtains for matrices that 


lim (1 — z)(I — zP)" = lim 


zl zl k 


x 


c=) 


if and only if 
(6.3) lim,..m (I+P+---+P") =L. 


Since zu obviously exists (possibly with some infinite entries) one deduces from 
(6.2) the well known ergodic result that aa = (aL). 

From Theorem 5.2 one obtains for a Markov Chain, that forn > 0, M(n) = 
ix: P*, while, quite obviously, IM is constant over every interval of the form 
[n,n +1). 

(b) Continuous parameter Markov processes with finitely many states. Such a 
process is a special case of an M.R.P. for which p,;; = 0 and F;;(t) = 1 — e& 
for appropriate finite \; > 0. Clearly F; = H;. Setting n = (6:;n;) = (8:;A7°), 
one may write 


q= (I+ sn) 'p. 
and hence from (4.1) and (5.12) one obtains 
(s) = [I — (I+ sp) “P) fl — (I+ sp) 
m(s) +1 = (I — (I+ sp)P]". 


The corresponding expression for ® may also be written down, and from it, 
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one would expect to be able to derive the semigroup property of ®. This does 
not seem to be, however, a very simple deduction. 

(c) Renewal processes. A Renewal process (R.P.) is defined as a sequence of 
independent and identically distributed r.v.’s, say {X,:n = 1}. [The reader is 
referred to the survey paper by Smith [7] for details of Renewal theory, as well 
as for complete references to the proofs of the theorem stated below.] Equiva- 
lently, either the sequence of partial sums {S,:n 2 1}, or the process | N(t):t = 0} 
defined as before by N(t) = sup {k:S, S t{, can be termed a Renewal process. 
With emphasis upon the latter description, it is easily seen that the family of all 
R.P.’s and the family of all M.R.P.’s having m = 1 are identical. Most of the 
above results become either vacuously true (e.g., the results of Sections 3 and 
4), or obvious (e.g., the results of Section 5) for R.P.’s. Note that for a R.P., 
Qu = Fy. The analogues of Theorems 5.1, 5.2, for example, become, dropping 
all redundant subscripts, 


o(k;-) = PF - PF, WViz5-)=(-fpd—- Zz)", 
M = (1 - F)' 1 ae 1, and m = f(l —f)", 


(6.4) 


all of which can be derived directly with very little effort. It is the similarity 
between (5.12) and (6.4) that is referred to at the end of Section 5. 

As may be seen in [7], Renewal theory goes very much deeper than these 
finite results, the basic emphasis being on the study of the limiting behavior 
of M(t). In the following section, the limiting stationarity of an M.R.P. will be 
discussed as an application of the main limit theorem of Renewal theory, due to 
Blackwell and Smith, which states that if k is any non-negative, non-increasing, 
Lebesgue-integrable function defined on [0, ~), then 


it | k(x) dx if F is non-lattice 
0 


k(t)*M(t) . 
yi > k(nh) if F is lattice with span h, 


\ n= 


where for the lattice case, 1—> © over multiples of A. 


7. Stationary probabilities. In this section we derive the limiting form of a 
certain d.f. pertaining to an M.R.P., and use this to select the appropriate initial 
distribution for making the corresponding 8.-M.P. stationary. The method of 
derivation is to compute the pertinent d.f., a useful formula in itself, and to apply 
the Blackwell-Smith theorem (6.5) to it to obtain its limiting form.’ In [3], 
Smith derived the asymptotic form of the probabilities P;;(t) of an M.R.P. 
For discrete or continuous parameter Markov processes it is known that such 
is sufficient for ascertaining the initial distribution which makes the process 

5 As was pointed out by the referee, it is also possible to apply the more general Renewal 
theorem of Smith (Corollary 2.1 of [3]) upon making a suitable redefinition of the state 
space. 
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stationary. However, the problem of obtaining the stationary probabilities of 
an M.R.P. is not solved by deriving the limits of the P;;(t). One must instead 
consider the problem of finding the limit, as t — «, of the probability of being 
in state j at time ¢, of making the next transition sometime before ¢ + 2x and 
of this next transition being into state k. 


First of all, define formally the probability just referred to, as 
RS (x; t) — P\Z, - - J wit 41 = k, Sy 41 24 + x | Zo = 1}. 


Now by (5.14) and (5.15) one may straightforwardly show that 


RY (2; t) = DS Plan = 5, Jans = bk, Sa St < Sau St+2|Z =i 
n=( 


= LlQn(t + x) — Qu(t)] + Qi} (0). 
Hence by Theorem 5.2 and (5.13) one may write 
RY (x; t) = (Qa(t + x) — Qu(t)] * [Mis(t) + 6:Uo(t)] 
(7.1) = [Qi(t + 2) — Qi(t)) * Gis(t) *(M5;(t) + Uo(t)] 
+ 6: {Qi(t + x) — Qy(t)]. 


Upon applying the Renewal theorem, (6.5), to (7.1) with M;; = M and k(t) 
equal first to px — Q(t) and then to px — Qiu(t + z),° one obtains that if 
G;; is a non-lattice d.f. (which may be seen to be equivalent to assuming that 
j is recurrent and that not every non-zero Q;, tor 7, r ¢ C; is a lattice mass func- 
tion) and if b% < « 


lim RP (x; t) = Gi» )uzt [ [Qu(t + 2) — Qult)] at 
0 


t+o2 


= Gis( © ) pin “js [ (1 = Fy(y| dy 
/0 


If G;; is a lattice d.f. of span h, (that is, j is recurrent and all non-zero Q;, for 
i, re C; are lattice mass functions) and if bj, < «, then 
[2/h]+1 


(7.3) lim RS? (2; nh) = Gij(~)hyy} D> [1 — Fp(nh)] 
n> n=0 
where [y] is the largest integer less than y. 

Suppose G;; has variation less than one, and so is not a d.f. Then, as seen, 
for example, from (5.13), lim... M;;(t) = [1 — G;;()]’ < © and hence it 
follows directly from (7.1) that im... RS? (a; t) = 0 for all i, k and all x = 0. 
The above results are summarized in 

THEOREM 7.1. 

(i) If state j is recurrent and by, < ~, then 


6 It may easily be demonstrated that the function G;; causes no difficulty in applying 
(6.5), for which the family of k functions has been unnecessarily restricted. 
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(74) lim RSP (2; t) = Gil pit [ (1 — Fy(y)] dy, 
to 0 
where it is understood that if G;; is a lattice d.f. then both t and x may take on as 
values only multiples of its span. 
(ii) If state j is transient, then lim... RS, (x; t) = 0 for all i, k and all x = 0. 
Since P,;(t) = >°%, RY (@; t) and m < ©, one obtains as a consequence 
of this theorem, the following result of Smith ([3], Theorem 5). 


Coro.uary 7.1. For an M.R.P. for which 4; < ~, 
(7.5) limo Pis(t) = Gis ©) 05/mj; , 
with the understanding that t takes on only multiples of the span if G;; is a lattice 


df. 
Actually, in [3] the right hand side of (7.5) is given as 


Gute) [ rdD,(x) | [ x dK,;(z) 


where D; is as defined in (3.10) and where 
K;;(t) = P(Z, #i,Z,=j forsome u<vSt| Zo = i). 


That is, K;; is the first passage time d.f. of state 7 in the corresponding 8.M.P. 
determined by (m, A, ©*), (cf., Section 3 of [1]), in which a transition into state 
i from itself is not observed. The equivalence of the two limits follows from (3.11) 
and the relationships for 7 ¥ j; 


(7.6) Pi = Kiz#(1 — Kj) #1 —D;), oweg = bag — ds) (1 — Bey)", 
(7.7) Py; = (1 — Ky) #(1 — Dj), mij = (1—d;)(1—k,)", 
(7.8) Kj; = (1 — Qy;)™ # (G5; — Q;5), kis = (933 — 55) — aii); 


which are obtained using the methods and results of Section 3. 

For a Renewal process (m = 1). Theorem 7.1. reduces to a result due es- 
sentially to Doob [8] (cf., also Smith [9]). Actually if there were to exist only 
one recurrent state (and hence one for which p;; = 1) then Theorem 7.1 is 
essentially this result. 

Assume throughout the remainder of this section that the M.R.P. under 
consideration has only one recurrent class, C say, and that it is a positive class. 
By Theorem I.5.1.(c) this means that »; < © for all 7 e C. Assume also, for 
simplicity of notation, that each G;; is non-lattice for every recurrent state j. 
Under these assumptions it follows that for every state 7, Gi;(*©) = 1 and 
ujj < © whenever j e C. Consequently, 


all F : . 
(79) lim R&? (2; t) = | Mij I {1 — Fy(y)] dy if jeC 
me 0 if jzC 


which limits are independent of 7. 
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Define now a slightly more general process than an M.R.P., namely one in 
which the first transition time and state, (X, , J:), has a d.f. determined by an 
auxillary matrix § of transition distributions. That is, it may be considered as 
an M.R.P. with a random origin determined by §. In keeping with exist- 
ing terminology in Renewal theory, define a general Markov Renewal proc- 
ess (G.M.R.P.) determined by (m, A, 6, Q) as a functional of the process 
{(J,, X,):n = 0} in exactly the same way as is an M.R.P. (cef., Section I.3.), 
with the following different probabilistic structure on the (J, X)-process. Let 

X=Oas, Pl=kl =a 
(7.10) PlJ, = k, X: S x| Jd Q,, (2) 


PlJ,, si k, x, $s Lr | Jo ’ Ji ’ x, ee i Jans ; X,2|=" Qu,_,.k(2) 


for n > 1. Compare this description particularly with that of Definition 1.3.3. 
A similar definition may be made for a G.S.-M.P. 
Define A = (4, --- ,G,) with @; = nj; , and 6 = (Q,;) with 


t 
(7.11) Q,,;(t) = pyar’ [ 1 — F;;(y)] dy. 
0 


We wish to show that the G.S.-M.P. determined by (m, A, 6, @), which shall 
be denoted as the Z-process, is a stationary process. 


From the definition of the functions RS,’ (x; t) it may easily be seen that they 

satisfy the following recursion relationship for all t, s = 0. 
RY (a;t+s)= D> | RYP(2;s—y) da, RY (y;t) 
(7.12) ru=1 J0 
+ RS? (s+a2;t) — RS (s;t). 
Upon defining 
(7.13) Ra(x) = lime. RY (x; t) = 4On(2) 
by (7.9) and (7.11), and letting t — © in (7.12) one obtains 
(7.14) Ri(z) = > | RS (x;8 — y) dReu(y) + Ra(s +2) — Ryu(s) 
ru=1 JO 

for allz,s 2 O and 1S j,k s m. For all z,t 2 0,1 S i,j, k S m, define 


RS (2; t) = P(Z, = qs Jiu =k, Sermu St+2|/Z = i|, 


which is to say that R$} (x; t) is the counterpart of RS;’(x; t) as defined for a 
G.S.-M.P. For a G.S.-M.P. it may be seen that 


m t 
(7.15) RY(2;t) = > | RSP (a;t — y) dQinl(y) + 6: Oalt + z) — Ou(t)]. 
0 


u=] 


In words this equation states that in order to be in state j at time ¢t and make the 
next transition into state k before time ¢t + x, when the process starts in state 
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z, one must either make a first transition at some time y S ¢ into some arbitrary 
state u in accordance with Q,, and then in the remaining time ¢ — y end up in 
state 7 and make the next transition into state k before time ¢t + 2, or, in case 
i = j, make the first transition during the time interval (t, t + z], and make 
it into state k. The integrands in (7.15) are without ~’s because after the initial 


transition has occurred the process behaves like an ordinary 8.-M.P. From 
(7.15) it follows that 


m m t 
> a:RY(2;t) = > [ RS (2;t — y) daQ;.(y) 
0 


(7.16) i=l i,u=1 


+ a; Qn(t + 2) — a; Oe(t). 


For the particular G.S.-M.P. determined by (m, A, 6, @) with A, 6 as de- 
fined in the paragraph containing (7.11), (7.16) states, in view of (7.13), 
that 


> a; RY (2;t) = | RS (a; t — y) dRi(y) + Ra(t + 2) — Rpt). 
i=l 1u= 0 


As a consequence of (7.14), as well as of the definition of RS,’ (x; t), it therefore 
follows that 


P(Z, = 4 Jiwau = k, Sia st+ x] = Rix (x) 


= Pir uit | (1 — Fy(y)] dy 


which is independent of t. In particular P([Z, = j] = nyuj; = G;. 

For each t, define the three-dimensional r.v. Wi = (Jiu , J&war, Sema — t) 
whose coordinates respectively record for a G.S.-M.P. the state it is in at time 
t, the state into which the next transition will be made, and the remaining time 
until the next transition will occur. It should be clear that the W-process is 
essentially equivalent to the Z-process in that almost all sample functions of 
the one can be determined from the other and vice versa. We now summarize 
the results of the preceding paragraphs in 

THEOREM 7.2. Consider a given 8.-M.P. determined by (m, A, Q) for which there 
is only one positive class. Define A = (&,---, Gm) with @; = nua and Q = 
(Qi;) with 


t 
G,(t) = pani’ [ (1 — F,(y)] dy, 


the limiting transition distribution given in Theorem 7.1. Then the W-process which 
corresponds to the G.S.-M.P. determined by (m, A, 6, ©) is a stationary process 
whose marginal d.f. is given by 


PIW, < (j,k, 2)) = D pews [ [1 — Fy(y)] dy. 
jst 0 


ksr 





1258 RONALD PYKE 


It should be clear that a result corresponding to that given in Theorem 7.2 
is possible for a G.M.R.P. Indeed, if one represents the M.R.P. by the associa- 
ted S.-M.P. as defined in Section 1.3, then the result for the G.M.R.P. could be 
viewed as a special case of Theorem 7.2, generalized to cover the case of infinite 
m. Since this paper has dealt only with the case of m < ©, we shall leave the 


- 


analogue of Theorem 7.2 for M.R.P.’s until later. 


8. Examples. In conclusion, we list briefly some specific examples of M.R.P.’s 
to indicate the broad scope of applications for this family of stochastic processes. 
First of all, the special case in which Q;; = p.F; for each i and j, the p,’s being 
real numbers and the /’;’s being d.f.’s, arises in electronic counter theory and is 
studied in detail in [10]. This special case, by a slight reinterpretation of the 
sample functions, is seen to be essentially equivalent to the “zero order’? M.R.P. 
in which Q;; = Q; for each i and j. 

A second very important special case of an M.R.P., although in one sense 
somewhat degenerate, is a zero-one process. That is, m = 2 and py = px = 0. 
For example, in a queueing model, the server is either in a busy state or in an 
idle state, and these states are entered alternately. In a counter problem, the 
counter is either dead or free, and it alternates between these two states. The 
only “parameters” of such zero-one processes are the two d.f.’s of the duration 


times of the two states. Of course, one still assumes independence between the 
successive time periods. For such a process it is clear that Qe = Fe = Gy = 
H, = D,, Qu = Fu = Gn = Hz = D2, and Gy = Fe * Fn = Ge . Set Fe = 
F,. It therefore follows straightforwardly from Theorem 4.1 
2 


F, 9 and Fy = 
that, for7 = 1, 


P,(t) = (1 — Fx(t)\{l — Fy(t) * F2(t)|™ 
P3_;,:(t) = [1 — Fi(t)] * F3-,(t) * [1 — Fy(t) * F.(t)]~™ 


which checks with Theorem 1 of [11]. Zero-one processes arise in many problems, 
and hence have been studied by various authors with various emphases. Let 
it suffice here to mention the several papers by Takdcs (cf. [12] and references 
contained therein) in which the total time spent in one of the states during a 
given interval of time is particularly studied. The limiting normality of these 
“‘sojourn” times, under general conditions, derived by Takd4es, will be a corol- 
lary of a general Central limit theorem given in [13]. 

In an M.R.P., the probability distribution of the next state depends only on 
the present state of the process. An important generalization of an M.R.P. 
arises if one allows this distribution to depend also upon the time it took for the 
last transition. A special class of such processes is seen to be the class of 2-de- 
pendent stationary processes with non-negative r.v.’s, which fact indicates the 
different approach which would have to be used in studying the more general 
processes. The following example of an M.R.P. arises as a “first approximation”’ 
to this generalization. Let F; and F;, be d.f.’s. Define, for c > 0, G.(z, y) = 
F\(xz) if y< cand = F,(x) if y 2 c. For the process {X, ; n 2 1}, set 
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P(X, S z| Xn-1,°*> , Xi] = G.(z, Xa-1). 


This process is clearly equivalent to an M.R.P. with two states, where the state 
indicates the subscript of the d.f., F; or F2 , which was used last. Therefore, for 
¢=1,2 


—_ >=. 


Qa(x) = F,(min{z, c}), Qie(x) = Fi(max[z, c]) — Fi(c). 


Such a model and its analogue for arbitrary m may be applicable in life-testing 
or behavioural problems. One particular application to an inventory problem 
is the following. Suppose that state 2 indicates that a new efficient water pump- 
ing station is in use while state 1 indicates that an old inefficient auxillary station 
is also in use. Suppose that the transition times represent the successive times 
that it takes for the capacity of a reservoir to fall below a fixed level. We assume 
here that the reservoir is instantly refilled at these times. For such a model, 
the above example of an M.R.P. could be used in studying the cost of the pump- 


ing system as well as the proportion of times the auxillary pumping station is 
used. 
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A CONVEXITY PROPERTY IN THE THEORY OF RANDOM VARIABLES 
DEFINED ON A FINITE MARKOV CHAIN 


H. D. Mitzer'’ 
Statistical Laboratory, University of Cambridge 


1. Summary. Let P = (pj) be the transition matrix of an ergodic, finite 
Markov chain with no cyclically moving sub-classes. For each possible transi- 
tion (j, k), let H,(a) be a distribution function admitting a moment generating 
function fj,(¢) in an interval surrounding t = 0. The matrix P(t) = {pjfix(t)} 
is of interest in the study of the random variable S, = X, + --- + X, , where 
X» has the distribution H ;.(x) if the mth transition takes the chain from state 
j to state k. The matrix P(t) is non-negative and therefore possesses a maximal 
positive eigenvalue a(t), which is shown to be a convex function of ¢. As an 
application of the convexity property, we obtain an asymptotic expression for 
the probability of tail values of the sum S;,, in the case where the X,, are inte- 
gral random variables. 

The results are related to those of Blackwell and Hodges [1], whose methods 
are followed closely in Section 5, and Volkov [4], [5], who treats in detail the 
case of integer-valued functions of the state of the chain, i.e., the case f(t) = 
exp(@,t) (8, integral). 


2. Introduction and notation. Let k,,(m = 0, 1, 2, ...) be the state at time m 
of a finite N-state ergodic Markov chain with no cyclically moving subclasses 
and with transition matrix P = (px), where px = Pr(kn = k\ kn = J), 
j,k = 1, --- ,N. The distribution of ko is unspecified, since we shall mostly deal 
with probabilities conditional on ko . It follows that P is a non-negative, primi- 
tive and irreducible matrix. Let H (x) be a distribution function associated 
with the transition (j,k) (pj # 0) and let f(t) be the corresponding moment 
generating function, i.e., 


Six (t) = | a dH (x). 


We shall suppose that each f(t) is analytic in a strip which strictly contains 
the imaginary axis of the complex t-plane. There will therefore be a maximal 
strip 


(2.1) u < Re(t)< uw (-~ Sw<0<wt ~~), 


in which all the f(t) are analytic. 
Let X»,m = 1,2, --- , be a random variable having the distribution H ;,(2) 
if kn. = j and k, = k, ie., if the mth transition is (j,k), and iet 
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S, = Xi + --- + X,. Let P(t) be the matrix {pjfj,(t)} and let 


(2.2) [P(t)}" = {pPFP oO}, 


where P" = {p};’}. Then fj;’(t) is the moment generating function of S, condi- 


tional on the n-stage transition from state 7 at time 0 to state k at time n. Thus 
(2.3) fie’ (t) = Efexp(tS,) | ko = j, kn = kj. 


For real ¢, the matrix P(t) is non-negative and therefore it has a maximal 
positive eigenvalue, the Perron root, which we denote by a(t). Thus a,(0) = 1, 
and, for real ¢, a,(¢) has the properties (i) a(t) > 0, (ii) a,(t) > |a;(t)|, where 
a;(t),j = 2,3, ---, N, are the remaining eigenvalues of P(t). 

We shall say that f(t) is a degenerate moment generating function if it is of 
the form e” (8 real) and we shall say that P(t) is degenerate if it is of the form 


(2.4) P(t) = e*D(t)P{D(t)}", 


where D(t) is a diagonal matrix of degenerate moment generating functions. 
If P(t) is degenerate, then the sum S,, is also degenerate in the sense that given 
ko = j, kn = k, S, is deterministic and of the form S, = n6B + 8; — 6, where 
D(t) = diag {exp(6,;t)}. 

Let (pm), k = 1, --- , N, be the unique ergodic distribution associated with 
P. Then, if ko has the distribution (p,) and if we take expectation unconditional 
on k; , it is easy to show that E(X:) = a;(0). Thus a,(0) is a measure of the 
ultimate drift of S,, . 


3. Some properties of non-negative square matrices. For the sake of clarity 
we quote the following properties of non-negative square matrices from the paper 
of Debreu and Herstein [3]. 

(a) Let A 2 O be an irreducible (indecomposable) square matrix, and let a, 
be its maximal positive eigenvalue. Then a is a simple root of the equation 
al — A| = O, and there exist strictly positive left and right eigenvectors cor- 
responding to a, . If ¢ is any other eigenvalue of A, then |o| S a , and if |c| < a, 
then A is said to be primitive. 

(b) A finite stochastic matrix is the transition matrix of a Markov chain which 
is ergodic and without cyclically moving sub-classes if and only if it is primitive 
and irreducible. 

(c) Let B = (by) be a square matrix with complex elements, and let B* 
(\b!). If 8 is any eigenvalue of B, if A (20) is irreducible, and if B* <s 
then |8| < a. Moreover |8| = a and B* < A together imply that B* = 
if 8B = ae’, then B = e*D "AD where D* = I. 

In addition we state the following lemma which we need in Sections 4 and 5. 
It is an immediate consequence of the left and right eigenvector relations. 

LemMa 3.1. Let A be a non-negative, primitive and irreducible matrix of order 
N XN. Let a be its maximal positive eigenvalue with corresponding left and right 
positive eigenvectors y = (y;) and x = (2;) respectively, such that yx = 1. Let 


A 
A; 
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X = diag(x;). Then the matrix a; X'AX is a primitive, irreducible, stochastic 
matrix with limiting probability vector (x jy;). 


4. The properties of a,(/). Let ¢ = wu + w (wu, v real). Then for ¢ lying in the 
strip (2.1), the fj.(¢) and P(t) satisfy the following conditions: 


(i) fa(u) > 0, = (ii) |fa()| S fie(u), (ali) fn (0) = 1, 
(iv) P(u) 20, (v) {P(t)}* S P(u), 


where, in (v), we use the notation of Section 3(c). 

THEOREM 1. 

(a) The function a(t) is regular at each point t = u of the real axis in the strip 
(2.1). 

(b) An eigenvalue of P(t) is of the form e* (8 real) if and only if P(t) is de- 
generate, i.e., of the form (2.4). 

(c) In the strip (2.1) we have 


(4.1) 


ai(u) = |a;(t)| ) --» Njt=utw) 


PROOF. 

(a) Since for each real t, a:(¢) is a simple root of the determinantal equation 
lal — P(t)| = 0, and since jal — P(t)| is an analytic function of the two com- 
plex variables a and t, the result follows from the implicit function theorem for 
analytic functions (Bochner and Martin [2], p. 39). 

(b) If P(t) is of the form (2.4) then clearly a(t) = e*. If e* is an eigenvalue 
of P(t), then we put ¢ = w (v real), and it follows from (4.1) (v) and Section 
3(c) that P(wv) = e”"D(v)P{D(v)}~, where {D(v)}* = I. Thus |; Sb (iv)| ae | 
for each j, k and n for which p§?? > 0, and since f$? (iv) isa characteristic func- 
tion, we must have D(v) = diag {exp(78)} (8; real7 = 1, --- , N). Hence 
P(t) is degenerate. 

(c) The inequalities follow from (4.1) (v) and Section 3(c). 

THEOREM 2. If P(t) is not degenerate, then o,(t) (t real) is a strictly convex 
function of t. 

Proor. We have the factorization 


(4.2) lal — P(t)| = a*{1 — aay (t)}{1 — a ‘ae(t)} --- {1 — a ay(t)} 
and we consider the ¢-roots of the equation 
(4.3) lal — P(t)| = 0. 


If ja} > a(u) (t = u + w) it follows from (4.2) and Theorem 1(c) that 
lal — P(t)| # 0. Thus there can be no t-roots of (4.3) in any part of the t- 
plane for which |a| > a(u). 

Now suppose a;(u) is a concave function of u in some interval (u’, u’’). (The 
argument will be simpler to follow with the aid of a diagram of the u, a;(u) 
plane). We may choose real numbers a and b so that the linear function a + bu 
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satisfies 
(4.4) a+ bu > a(u) (u’ cu<u"), 


i.e., the line a + bu lies above the curve a;(u) in the interval (u’, u’’). In (4.2) 
let a = a + bt. Since 


la + bt] = {(a+ bu)? + ayy! 2a+t bu > a(u) (u’ <u < uw”), 


there are no roots of the equation 
(4.5) l(a + bt)I — P(t)| =0 


in the strip of the t-plane u’ < u < wu’. But the t-roots of (4.5) are continuous 
functions of a, and we may choose values of a and b so that the line a + bu 
cats the curve a;(u) in two points, thus producing two roots of (4.5) in the 
strip (u’, u’’). Thus for a suitable b, there is a value of a, say a’, such that for 
a > a’ there are no roots of (4.5) in the strip (u’, u’’), while for a < a’ there 
are two roots. This contradicts the continuity of the t-roots of (4.5) and there- 
fore a;(u) cannot be concave in any interval. 

Further, a;(u) cannot be a linear function. For if a;(w) = 1 + cu (ce # 0), 
say, we can choose a real number @ so that the function e“(1 + cu) is concave 
near the point u = 0. But ea (t) is the maximal eigenvalue of the matrix éP(t), 
which is of the same type as P(t), and which cannot therefore have a concave 
maximal eigenvalue. 

It follows that a;(u) is strictly convex. 

We may specialize our results to integral random variables. To this end, let 
¢j.(z) be a probability generating function associated with the transition (j, k) 
and suppose that there is an annulus ro < |z| < ro (0 S m <1< 17 S @) in 
which all the ¢;(z) have convergent Laurent series. Let Q(z) denote the matrix 
{ pixbix(z)} and we suppose that Q(z) is not of the degenerate form 


(4.6) Q(z) = &ZPZ", 


where @ is an integer and Z is a diagonal matrix of integral powers of z. For real 
and positive z let a,(z) be the maximal positive eigenvalue of Q(z). If we set 
z = e’, then, by Theorem 2, a;(e') is a strictly convex function of ¢ (t real) and 
therefore a,(z), though not necessarily convex, has the property of not having 
a local maximum for real positive z. This generalizes the result of Volkov [4] 
who demonstrated this property in the special case where ¢j(z) = 2™*. 

We return to the matrix P(t) as defined in Section 2. The convexity property 
of a,(u) (t = u + w) raises the question of whether a;(u) attains its unique 
minimum at a finite value of u. The answer is clearly affirmative if a;(0) = 0. 
If ai(0) < 0 say, then either a,;(u) continues to decrease as u increases or it 
reaches a minimum and then starts increasing. We distinguish between the 
cases where the strip (2.1) includes the entire right half-plane (up = ©) and 
where it is bounded to the right (uo < ©). Modifications for the left half plane 
will be obvious (i.e., for the case where a;(0) > 0). 
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THEorEM 3. Let t = u + ww and suppose that P(t) is not degenerate. 

(a) Suppose that a,(u) is defined for all u > O (i.e., uy = ©). Then a neces- 
sary and sufficient condition for a,(u) to be uniformly bounded (and so monotonic 
decreasing) for all u > 0 is that there exists a diagonal matrix D(t) of degenerate 
moment generating functions such that each element of the matrix 


(4.7) Q(t) = {D(t)} “P(t)D(t) 


is of the form pixqix(t), Gix(t) being the moment generating function of a non-posi- 
tive random variable. In the case of integral random variables, each element of D(t) 
and each qjx.(t) will be the moment generating function of an integral random vari- 
able. 

(b) Let a1(0) < 0 and suppose uy < ~. Then a(u) attains its unique sta- 
tionary minimum at a finite positive value of u if one of the following conditions is 
satisfied: 

(i) There exists a number uw, (0 < wu < uo) such that for each j, k for which 
fix(t) is defined, Sinl u,) = 0. 

(ii) For some j, k, fix(u) - ~ asu— “-. 

PROOF. 

(a) If (4.7) is satisfied, then P(t) and Q(t) have the same eigenvalues. Since 
each element of Q(t) is non-increasing for t > 0, it follows from Section 3(c) 
that a(t) is non-increasing for t > 0 and therefore bounded for all t > 0. 

Conversely, if a;(u) is bounded for u > 0, we note that for each j, k for which 
pix > 0, and for some finite, real By, , Pr (X > By) = 0, where X is a random 
variable with moment generating function f;,(¢). For if not, then we can find n 


and j such that Pr (S, > 0|ko = j, kn = j) > 0, which implies that f}}’(u) — « 
as u — ©. But this contradicts the boundedness of a;(u) since {a;(u)}" = 
ps;’(u). Thus for each j,k, f(t) represents a random variable which is bounded 
above and we may write 


(4.8) fix (t) = exp( 8 jt) ix(t), 
where 6 ,, is real for each j, k and 
(4.9) gix(t) = o(e“) (t> +) 


for every « > 0. Let {2z;(t)} be a right eigenvector of P(t) corresponding to the 
eigenvalue a(t). We can choose x;(t) to be the co-factor of, say, the element in 
position (1,7) of the matrix [a,(t)J — P(t)] (j = 1, --- , N). Thus, for each 7, 
x;(t) is expressible as a sum of products of the elements of [a,(t)J — P(t)]. 
Hence from (4.8), (4.9) and the boundedness of a(t) (¢ > 0), it follows that 
there is a finite real number 8; such that 


(4.10) x(t) = y;(t) exp(8;t) 
where for each j and every « > 0 


(4.11) y;(t) = o(e“) 





A CONVEXITY PROPERTY 


T(t) = diag {x;(t)}. Then the matrix 
(4.12) trie(t)} = [aa(t)) [T(t] P(t) T(t) 


is a stochastic transition matrix for each real t by Lemma 3.1 and hence for all 
real t we have 0 S rj(t) S 1. From (4.12) we have for each j, k 


Ditfix(t)an(t) = 2;(t)an(t)ra(t), 


and from the relations (4.8) to (4.11) it follows that By + 6, <= 8; for each 
j, k for which pj, > 0. The result now follows by taking D(i) = diag {exp (6,t)}. 

In the case where each f;,(t) is the moment generating function of an integral 
random variable, each 8, and 8; will be an integer. 

(b) In (i), we have ai(m) 2 O since a:(u) is a nondecreasing function of 
each of the elements, and thus a;(u) must attain its minimum in the interval 
0 < u < u,. In (ii) suppose that for some fixed j, k, fy.(u) ~ © asu—> w—. 
We choose n so that P” > 0 and since up < ©, we can find C > 0 such that 
fi; (u) = Casu— ujy—. Then we have 


, n+l (n+l) ,/ ‘ (n) ) 
fai (u)} = fi; (u) = pupa; fix(u)fiz’ (uv) 
¥ (n) , 
= Cpxpij fx(u) 7 © asu—>wm-. 


Thus a(u) + «© as u — uo— and the result follows. 

We now explore further the properties of a;(t) in the case of integral random 
variables. We first state a well known result concerning characteristic functions 
of integral random variables. 

Lemma 4.1. Let f(iv) = E(e"*) (v real) where X is a non-degenerate integral 
random variable. Then \f(iv,)| = 1 for some x, (vy, ¥ 0, —xr Sy S 2) if and 
only if v;/2r is a rational number, say v; = 2rp/q (g.c.d. [p, g] = 1, ¢ > 1), 
and f(iv) is of the form e'”'g(iv), where m is an integer and g(iv) is a character- 
istic function of period 2x/q in v, or equivalently if and only if X only takes values 
of the form m + ng (n = 0, +1, +2, --- ; m, q integral, q > 1) 

In the following theorem we prove a corresponding result for the functions 
a;(w) and it is sufficient to suppose that each f(t) exists only on the imaginary 
axis. 

THEOREM 4. Let t = w (v real) and suppose that each of the functions f (ww) 
is a characteristic function of an integral random variable. Let a;(iv) (gj = 1, ---, N) 
be the eigenvalues of P(iv) where a,(0) = 1. Then there exists a number v, # 0 
(—x Sy S x) satisfying a;(wv,) = 1 for some j if and only if v,/2x is a rational 
number, say v) = 2xrp/q (g.c.d. [p, gq] = 1,q > 1), and P(w) is of the form 


(4.13) P(iv) = e'”"’D(iv)Q(iv){ D(iv)}™ 


where 

(i) Q(v) = {pugin(iv)}, each gj, being a characteristic function of period 2x/q 
in v, possibly gx(w) = 1; 

(ii) D(iv) = diag fexp(imw)} (m,, --+ , my integral) ; 

(iii) m is integral. 
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Proor. If P(iwv) is of the form (4.13) then we may take v1; = 27/q and then 
exp(imv,) will be an eigenvalue of P(iv;). 

Conversely, suppose that e” is an eigenvalue of P(iv;) (vu, 0, —r S vn; S x). 
Since {P(iv,)}* < P it follows from Section 3(c) that 
(4.14) P(iv;) = e“ DPD", 
where D* = J, and hence that 


(4.15) Se’ (ir1)| = 1, each j, k and n such that pjz’ >0. 
If fSf? (iv) is degenerate for each j, k ».d n for which pj;’ > 0, then [{ P(iv)}"|}*¥= 
P” for all v and nm and thus |a;(iw)| = 1 (all v). Hence, again by Section 3(c) 
P(iv) = a(iv)D(v)P{D(v)}~ where D(v) = diag{d;(v)} (\dj(v)|} = 1 
j = 1,---,N). For each J, k and all sufficiently large n, therefore, 

{ay(iv)}” dj(v){dk(v)}~* 


is a degenerate characteristic function, so that a,(iv) must be of the form e”” 
(8 real and constant). It follows from Theorem 1(b) that P(iv) is of the form 
(4.13) with Q(w) = P. 

Otherwise, for some j, k and n, fiz? (iv) is not degenerate and hence v; = 2rp/q 
(g.c.d. [p, g] = 1,q > 1) by Lemma 4.1. In virtue of (4.15) we may write 


(4.16) HP (w) = exp(imi?v)gx(iv), each j, k, n such that p{;? >0 


( ° e ts ‘ ° . Fi ° 
where gj; (iv) is a characteristic function of period 24/g and mj,’ aninteger. 


In (4.14) let D = diag {exp(78;)} (8; real, 7 = 1, --- , N). Then (4.16) (with 
v = v; = 2rp/q) implies that 


(4.17) 2epms;’/q = no + Bs — Be + 2N Six, NS? integral, 


(n) 


for each j, k and n such that pj,’ > 9. By evaluating (4.17) at n and n + 1 
(where n is such that P” > 0) we obtain 


. o = 2xpm/q + 2Mrz, m, M integral, 
and (4.17) for 7, h and k, h gives the result 
(4.18) By — Be = (mS — m&P)2xp/q — 2(NSP — Nix?) 
Since the left hand side of (4.18) is independent of h we may take 
exp {7(8; — &)} = exp {i(m; — m,)2xp/q}, m,°*** , My integral. 
Now from (4.17) we see that (writing mj, = ms,’ ) 
my2ep/q = (m + m; — m)2rp/q + 2Njar, N’ integral. 


Hence mj, = m + m; — m + Nixg/p. Thus Nxq/p must be an integer and 
since g.c.d. (p,q) = 1 we must have mj, = m + m; — m + qM jy, , Mx integral. 
From (4.16) with n = 1 it follows that P(iv) is of the form (4.13). 

If f(w) is the characteristic function of an integral random variable, we may 
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say that f(iv) is expressed in its lowest terms if f(iv) = e''g(i), where g(iv) 
has minimal period 2x/q (q integral, g 2 1) and m is an integer satisfying 
0 s m < q. Analogously, in the matrix case we may say that P(iv) is expressed 
in its lowest terms if it is written in the form (4.13) where Q(iv) has minimal 
period 27/q (q 2 1) andO S m < q. Hence an alternative statement of Theo- 
rem 4 is 

TueoreM 4’. Jf P(iv) is expressed in its lowest terms in the form (4.13), then 
la;(iv)| < 1 (0 < jo) S 37 = 1, --- , N) Of and only if g = 1. 


5. The probability of tail values of the sums S,. We use the notation and 
definitions of Section 2 and we suppose that each f(t) is an analytic moment 
generating function of an integral random variable. We suppose also that P(iv), 
when expressed in its lowest terms, satisfies the conditions of Theorem 4’, i.e., 
if Q(iv) has minimal period 22/gq (q integral, g = 1) where 


(5.1) P(t) = e™ [diag {exp(m,t)}]Q(t) [diag {exp(m,t)}]", 


m,m%, °°: , My integral, then g = 1. If P(tv) does not satisfy these conditions, 
i.e., if g > 1, then we write Q:(iv) = Q(iv/q). Now Q,(iv) has minimal period 27, 
and it would be sufficient to study Q, instead of Q. Hence it is clearly no loss of gen- 
erality to suppose that g = 1. Accordingly, we summarize our assumptions con- 
cerning P(t) as follows: 


(i) P(iv) has minimal period 27 in », 
(5.2) (ii) P(t) is not reducible to the form (5.1) with g > 1, 
(iii) P(t) is not degenerate. 


If a is any real number, then e “a;(t) is the maximal positive eigenvalue of 
the matrix e “P(t) and is therefore a strictly convex function for real t. We 
choose a so that the matrix e “P(t) satisfies one of the conditions of Theorem 3 
and also so that a > a;(0), thus ensuring that e“a;(t) attains its unique mini- 
mum at a real, positive, finite value of t. Let 


m(a) = infss9 € “an (t) 


and let t*(a) satisfy m(a) = exp{at*(a)}a;(t*(a)). Since a1(0) < a we have 
t*(a) > O and 0 < m(a) < 1. For brevity we write (* = t*(a). We now de- 
fine the matrices 


@,(a) = {pSP Pr(S, = na| ko = j, kn = k)} 


and 


(n) 


II,(a) = {py Pr(S, 2 na| ko = j, kn = k)}, 


and our task will be to obtain asymptotic expressions for these asn — ©. We 
shall follow closely the methods used by Blackwell and Hodges [1]. 

The matrix e “’P(t*) is non-negative, irreducible and primitive, so that it 
has positive right and left eigenvectors z* = (x}), y* = (yj) respectively such 
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that y*x* = 1. Let 
rx =e '{m(a)} ‘te (23) Dif a(t*). 


Then it follows from Lemma 3.1 that R = (r,,) is the transition matrix of an 
ergodic Markov chain with no cyclically moving sub-classes. Let K, denote 
the state at time n in a realization of this chain (n = 0, 1, 2, ---). Let 
R(t) = {risfie(t + t*)/fi(t*)} 1.€., 
(5.3) _ 
R(t) = {m(a)}"e ““D "P(t + t*)D, 


where D = diag (x;). For each j, k for which px > 0, fy(t + t*)/fiu(t*) is 
the moment generating function of an integral random variable. We define a 
sequence of random variables Y;, Y2, --- associated with the Markov chain 
Ky, K,, Kz, --- in such a way that Y, has the moment generating function 
fil(t + &)/fa(t*) if K,u = 7 and K, = k. Thus ¥Y,, ¥2, --- are associated 
with R(t) in the same way as X, , X2, --- are associated with P(t). 

Let R” = (rf?) and T, = Y; + --- + Y,. If we raise each side of (5.3) to 
the power n and equate coefficients of e"“ (assuming na to be an integer) we 
obtain the relation 

psi’ Pr (S, = na| ko = j, kn = k) 


(5.4 


n * rn , ° > 
= {m(a)} x} (ap) , we Pr(T, = na| Ky = j, K, = k) 


which corresponds to Theorem 1 of Blackwell and Hodges. Further, for any 
integer s, we have 
pie Pr(S, = na + 8| ko = j, kn = k) 
(5.5) nek (gt), 4m) : Se 
= {m(a)}"x;(a,) « Pr(T, = na+ 8|Ko=j, Kn = k). 

Let B,(t) = a(t + t*)/a,(t*), Bo(t), --- By(t) be the eigenvalues of R(t). 
Since 8;(0) = a, the asymptotic expectation of the increment T, — T,-1 is a, 
whereas that of S, — S,-; is a;(0). Thus we have achieved a shift of expecta- 
tion similar to that of Blackwell and Hodges and others mentioned in [1]. 

For each j, k the possible values of Y; are identical to those of X,; and so 
Bi(iv)| < 1 (0 < |v| S wr) by Theorem 4’. Since 6,(0) = 1 and [R(iv)]* Ss R, 
it follows from Section 3(c) that |8;(w)| < 1 (j = 2,3,---,N; —-r Sv Sw) 
and hence by continuity that there exists a number 7 (0 < 9» < 1) such that 
8,(w)| S on (j = 2,3, ---,N; —et Sv S ow). Let x(t) = {2;(t)} and y(t) = 
{y;(t)} be respectively right and left eigenvectors of R(t) corresponding to the 
root 8,(t), chosen so that 7;(0) = 1,7 = 1, ---, N and 


N 
De 2s (t)y;(t) = 1. 


= 


It Jollows from the Jordan canonical form for R(t) that 


(5.6) {R(w)}" r(iv)y(iv){B,(iv)}" + O(n") 
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Let o = 67(0) — @ (=a; (t*)/a;(t*) — a’) and we have 

THEOREM 5. Provided that na is a possible value of S,, the following asymptotic 
matriz relations hold asn — ~ 

(i) @,(a) = {m(a)}"{o(2an)'} *x*y*{I + O(n™)}; 

(ii) II,(a) = {m(a)}"[o(2en)*(1 — e& ")J'aty*{7 + O(n")}. 

Proor. It follows from (5.6) and the theory of Fourier series that 


(mn) 


rye Pr(T, = na| Ko = j, Kn = k) 


= (2x) [ e “""'B, (iv) }"2;(tv)y,(iv) dv + O(n") 


= (2x) | B,(v) dv + O(n"), say. 


Since 8(0) a and 6;(0) — a’ = o°, we may choose v (>0) so that 
(5.8) e"B,(w)| S$ 1 — o'v’/3 (\v| S v0). 


We break up the range of integration in (5.7) into the ranges |v} < n™ logn, 
n* logn < |v] < vo, and v < |v| < x. In the first of these ranges we have the 
expansions 


x 
log [{e“*"B,(iv)}"] = —no'v"/2 +n Do bw" 


r=3 


x 
x;(iv)y.(w) = TeYyn + > cw" (c, = ¢-(j,k),r = 1,2 ---) 
r=] 
since, in the latter expansion, y(0)z(0) = 1 and therefore, by Lemma 3.1, 
y.(O) = aeyp . Thus on setting w = n'ov, the integrand in (5.7) may be written, 
for |w| < clogn, 


eo tatyt + wCy(w")n + Cx(w*)n™ + o(n™) 


Y y ° : 2 : . 7: 
where C, and C2 are polynomials in w’, depending on j and k but not on n. Using 
the result that 

prloen 


| Pe? "dt = 2°*?r2(p + 1)} + o(n”) 
—glogn 


when p is even and vanishes when p is odd, we have 


n~logn 
(2")7 | B,(v) dv = (Qane®)xtyt {1 + O(n7)}. 


n~flogn 


In virtue of (5.8) we have 


| B,(v) dv = 0 (/ exp(—4} not’) av) 
n~flogn<|v|<vo n 


n—dlogn 
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which is o(n “) forall K. Inthe range v» < |v| < x, |6,(iv)| S p, say, (0 < p < 1) 
and so 


[ B,(v) dv = O("). 
vo<lulgr 


Combining these we find finally 


(n) 


rs? Pr (T, = na|Ko = j, Kn = k) = (2uno’) af yt{1 + O(n™)} 


and the result (i) now follows from (5.4). The result (ii) follows from (5.5) by 
summing with respect to s (s = 0, 1, 2, ---). 
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LIMIT DISTRIBUTIONS IN THE THEORY OF COUNTERS 


By G. SANKARANARAYANAN 
Annamalai University, India 


1. Introduction. Let us suppose that particles that arrive on the counter be 
randomly spaced on the positive time axis. In an actual counter each particle 
arriving in the time interval (0, © ) independently of others gives rise to an im- 
pulse with probability p or 1 according to whether at this instant there is an 
impulse present or not. Hence the counter is in one or other of two mutually 
exclusive states which we denote by A and B. The counter is in state A when no 
impulse covers the instant and is in state B otherwise and it assumes the states 
A and B alternatively. A particle striking the counter is recorded if and only if 
the counter is in state A. If p = O ,we get the type I counter and if p = 1, we 
get the type II counter. 

Let t, , te , tg --- be the instants at which particles arrive and x; , x2 , x3, °** be 
the lengths of successive impulses. Let 7; , 72 , 73 -- + be the instants of successive 
recordings. Let us assume that the time from an arbitrary point in the positive 
time axis to the time of arrival of the first particle that follows it is a random 
variable with distribution function F(x). Hence the differences {t,4; — 4}, 
r = 1, 2, 3,---, are identically distributed random variables independent of 
each other with a distribution function F(z). Let the time durations of the 
impulses be independent and identically distributed random variables with 
the distribution function H(z). Let these random variables be independent 
of the instants of arrivals and of the events of the realizations of the impulses. 
Let v, denote the number of registered particles in the time interval (0, t). Let 
the process start in state A. Let us denote by & , m , &, m2 , --- the times spent 
in states A and B respectively. Consequently {&,} and {y,} are independent 
sequences of identically distributed positive random variables. It can be seen 
that Pr[é, < z] = F(z), x > 0. Let Pr[y, S xz] = U(x), xz > 0. It can also 
be seen that the instants of transitions A — B coincide with the instants 1, , 
n = 1,2,3, --- . Hence the time differences {7,4; — 7,} are identically distributed 
random variables whose distribution function G(x) is given by 


(1.1) G(x) = F(z)sU(z) -[ U(2 — y) dF(y). 


In [8] Také&es has shown that », , suitably normalized, is asymptotically nor- 
mal. In [9] Tak&cs has considered the asymptotic behavior of »,/t. Here he has 
also applied the law of the iterated logarithm, as stated by P. Hartman and A. 
Wintner [5], to »,. In this paper we consider the asymptotic (~) behaviour of 
v, when F(x) and H(z) follow the stable laws with suitable characteristic ex- 
ponents and we show that »;, suitably normalized, tends to the Mittag-Leffler 
distribution for all counters of the types detailed above. 
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2. Definitions and notations. Let 
@ 
r(s) - | e dF (zx) 
0 


and 


so that 


“* dG(x) = r(s)-w(s). 


Pr[m & 2] = F(z). 
Let 
W(t,n) = Pr[vy, <n] 
(2.1) 1 — Pr[tasi S #] 
1 — F(t)*G,(t), 


where G,(t) is the n-fold convolution of G(t) with itself and Go(t) 
and 0 otherwise. So 


Pr [v, = n| = F(t)*G,4(t) — F(t)*G,(t). 


Let m,(t) be the kth moment of », 


x 


m,(t) = >, n*[F(t)*G,4(t) — F(t)*G,(t)] 


n=1 


> [(n + 1)* — n*‘]-[1 — Wt, n)]. 


n= 


* dm(t) = s[ e  m,(t) dt 
0 


- s/ e* > [(n+1)' —n')-[1 — W(tn)]dt 
0 n=0 


=sd lint ut—n' [ et —wien)lat 
0 


n=0 


= r(s) >) [(n+1)* — n'][y(s)]". 


Interchange of summation and integration is valid since all the terms on the 
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right side of (2.2) are positive. Let N(t) be the number of particles arriving in 
the counter in the time interval (0, ¢). Let 
Q,(t) = Pr[N(t) = nl]. 
So 
Pr[(N(t) Sn) = Pri{t S tay] 
= 1 — Fry(t), 


where F,,4:(t) is the n + 1-fold convolution of F(t) with itself. So >>}? Q,(t) = 
1 — F,4,(t) and hence 


(2.5) Q(t) = Fr(t) — Faa(t). 


3. Type II counter. Consider the type II counter. Let P,(t) denote the prob- 
ability that at the instant ¢, the system is in state A. 


(2.4) 


P,(t) = Pr (¢ < 71) + Pr (te — & < t < m2) + Pr(r3 — & <t < 73) + eee 
Pr (¢ < 7) + [Pr (t< re) —Pr(ts&+™m)) 
+ [Pr(t<73)—Pr(tSiitmt&itm)|+-::: 


2X, Ga(t) — > G,(t)*F(t). 

n= n=(0 

P(t) can also be got as follows. If we know that in the time interval (0, ¢) 
exactly n particles arrive at the counter (the probability of which is Q,(t)), 
then the occurrence points of these n events may be regarded as independent 
uniformly distributed points in time interval (0, ¢). The probability that an 
impulse started at a random point will end before the instant ¢ is py = 
(1/t) Jo H(z) dx. Because of independence the probability that all the n im- 
pulses started at random points will end before the instant ¢ is (p,)". So 


P(t) = >> Q,(t)(p.)" 


n=0 


> (Fn(t) — Frsa(t)|(ps)” 


n=(0 


1 — (1 — pr) >, F(t) (ps). 
) 


n=( 


Also using (3.1), we see that 


I e P,(t) dt = | e* [= G.(t) - > Guero | a 
0 0 n=(0 n=0 


= (s*) {(1 — y(s)) 7 — r(s)/(1 — y(s))} 
= [1 — r(s)]/s{1 — y(s)]. 


(3.3) 
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Hence 


—1 


(3.4) y(s) =1— (1 — r(s)) E [ e" P,(t) at| 


4. A lemma. We now state a lemma which can be got as the converse of a 
theorem [[4] Theorem 108, p. 168] by an obvious change of variables. 


Lema. Let L(x) be such that L(cxr) ~ L(x) for every positive c (asx — ©). 
Then 


2 


(4.1) > ane” w r*L(N)P(a +1), asrt>0,0>0,a>0, 
n=0 

of 

(42) A(n) =a +a+a+t+-::: +a, ~ (1/0)n™L(n’), asn—> o, 
In particwar, if @ = 1 and e* = 2, we get 

(4.3) >> age” ~ (log 1/r)~*L {flog (1/x)]"}T(a + 1), asx — 1, 

n=( 
of 
(4.4) A(n) =a tat+::-+a,~n*L(n), asn— ©, 


Using this lemma in (2.3), we get 
[tt dm) = 168) D (in + It = why(s))" 
0 n= 


~ r(s)[log {1/y(s)}J“* (k + 1), ass— 0. 


5. Asymptotic behavior of P,(t) in a type II counter. We prove the following 
theorem. 
THEOREM 5.1. In a type II counter, if F(t) has a characteristic function $(d) 


of form 

(5.1) log @(A) = cI'(—a)[cos (wa/2) — ,i(A/|A|) sin (aa/2)])|A\* 

with c > 0 and 0 < a < 1 and further if p, be chosen such that 

(5.2) 1 —p~n? ast— @ 
where 8 < a and dX are constants, then 

(5.3) P(t) ~ (c/da)t ™, ast— @, 


Remark I. In [3] it is shown that the characteristic function ¢(\) of the stable 
law is of the form 
(5.4) log@(A) = mt + cI'(—a) [cos ([xa/2] + 78(X/|A}) sin (2ae/2)]}/A|* 
where 0 < a < 1,0 eg  § <= 1, and y is any real number. In particular, 
ify = Oand 6 = —1([[6] p“S 234" ~~ represents the characteristic function of 
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a positive random variable. In Theorem 5.1 we have takena distribution function 
with this characteristic function. This stable law for F(t) correspond to well 


known recurrent processes [10, 11] which describe the arrivals of particles to 
the counter. 


Remark II. If H(z) ~ 1 — A(1 — 6)2*, asa — @, then p; has the property 


1— prt”, ast— o«, 


Proor. In our case [6] F(t) can be put in the form 


(5.5) F(t) = 1 — [e/a + e(t)}(1/t*), 0 <t< @. 


where e(t) ~Oast— o. Also 


(5.6) F,(t) = F(t/n"*) = 1 —[ne/a + n e(t/n"*)](1/t*). 
From (3.3) 


P,(t)=1-(1- ps) 2. Fa(t)(ps)”™ 


(1- ps) Ul — F,(t)\(ps)"” 


e(1 — pia t 2, n(pi)"" + (1 — pit “din e(t/n!*)(p.)" 


I+ Il 


‘ e(1 el pila 't*>, n(p.)"™ 
n=l 
c/{at“(1 — pz), 


II = (1 — p,)t* >. ne(t/n"*)(p,)". 


n=l 


Consider the sum 


(5.7) R(t) = (1 — py)’ dn e(t/n"'*)(p.)"™. 


Using the theorem in the appendix, when a $ 3 


eo 
\2 r—ly pr! (1—a@) ,—a?/(1—a) 
R(t) S (1 — pe)® Den(ps)” {Kin rere? + 
n=1 
Kin" t” (-a)—all+a)/ a) + 
Kin! 5-2 (l—a) 4. 
3 


i x! (2—a)/(l—a) ,—a(2—a@)/ (l—a) 
int a 4. Kin a a a(2—a m 
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ai ‘ . . ° ‘ 
where K; ,i = 1, 2,3, 4, 5 are constants independent of t. So 


R(t) S (1 — ps) (Kit) Sn! (p,)"* + 
1 


© 
in re ae wre f- 
1 


x 


(Ki/t > n _(2—a)/(l—a) (p:)*— + 


1 


t*) >> n*(p.)"" 
1 


7! pr? a)/(1—a) (8—2.a)/ (— ~~ n—l 
(Ks/t yon (p.)"} 


i 

2 Ki t*” a- » .f 1/( wn a3" oh + 
{Ke/t0OTr'"?) 41/1 — pr} + 
(Kite w/a — po (- “) + 
{K4/t%} -{1/(1 — pe)*} + 
(Ki /t2? a) (—a), 41/1 — p) G-a)) 


t—- @ 


, 


2 


(Kafr) 11/2 — pe) Be) + 
{Ky /t00r'O") f/(1 — pe) Orr} + 
{K3/t °°} -{1/(1 — pe} + 
{Ko/t%}-{1/(1 — pe)} + 
(Ki /ee2- 1 G-o) . (1/ (1 — pr) (2—a)/(1- t— ©. 
Since 1 — pp ~ At”, t 
, /{r* a eaten B) oe +. 
ae al a) ,A+a)(a-$) is + 
er coo se 


uo} + 


rt 464 (2—a)/(1+a) ,(2—a) (a—B)/ (1—a) 
/{y a 7s a a }. 


Hence when 8 < a, 
R(t) — 0, 


When a = 3, using the theorem in the appendix, 
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x 


R(t) s (1 — p,)? > n(p,)"" 


n=1 


[Ki (t/n?)*) + Kz (t/n®?)* + Ky (t/n?)* + Ki(n/t) (log t — 2 log n)] 


< (1 — p.)? Do n(ps1)”"™ 


n=1 
[Ki (n/t!) + K2(n*/t') + K(n?/t) + Ki(n log t)/t! — K{(2 n log n)/t'] 


. . eo 
where K;,7 = 1, 2, 3, 4 are constants independent of t. So 


R(it)sA- p1) [(K1/t) > n°(p.)" ‘+ (Kz /t') >, n‘(p:)" —_- 
1 1 


(K3; t) a. n*(p.)" + (KG log t)t* 7 n'(p.)"— _ 
1 1 


(2 Ki/t) & n? log n(p,)""J. 
1 


(Ki/t')-{1/(1 — ps)} + (K2/t)-{1/(1 — pa)} + 
(K3/t)-{1/(1 — ps)*} + (Ki (log t)/t)-{1/(1 — ps} — 
(2 Ki/t)-{1/(1 — pe)"*4. 

So 
gy Rit) S Kr/{MO} + Kz /{neo) + K3/{wio™} + 
ve Ki (log t)/{ne2} — 2 KG/{n ree Perey, 
Here «¢ is a small positive number which can be taken as small as we like so that 
e < (1 — 28)/28. So when 8 < 3, R(t) ~Oast— «. Hence what ever be the 


value of a in 0 < a < 1, when B < a, R(t) ~ 0 ast — ~. So when B < a, 
II in (5.8) is negligible compared to I. Hence 


P(t) ~ {e/(at*)} -{1/(1 — ped}, t—> @ 
(5.10) am 
~ {e/(ra)p to", t— o, 
6. Asymptotic behavior of », in a type II counter. 
THEOREM 6.1. With the same assumptions on F(t) and p, as in Theorem 5.1, 
we have, 


. . > ss _y x < sa 
(6.1) lim Pt| G i) #-Kla,B) = | ga(x) 
where 


z(@ 


(6.2) ga(x) = [I (r8)| [ 1 0) */j!] sin (Bj) (Bj + 1)y’ q dy 
0 = 
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and 

(63) K(a,8) =T(1 —a+8)/T(1 — a). 
Proor. From (5.3), we have, 

(6.4) P(t) = {c/(a)j™, 


Also P,(t) = 1, whent = 0. So 


s [ e “P,(t) dt = or espe a+ nt : e "P,(t) dt 
0 to 0 


= [(cs) Qa)] fe errs 


“sean “cate | e"'P,(t) dt. 


In (6.5), the second integral i is less than t)'**”’/(1 — a@ + 8) and the third 
is less then M(1 — e“*) s' where M isa constant. So 


(6.6) s [ P(t) aw TU ae 
0 Nas—(e-8 ) 


It follows from (5.1) that 


r(s) = [ e dF(t) = exp | - 
0 


Using (3.4), 


— {] pone 
uid aint _ Ul — {1 — (/e)eP (1 — @)s*}) 


[Pl — a + B))/Pras“-P] 


AT(1 — a)s® 


a neon 
rl — a+ 8)’ 


Using (4.5) we have 


f sili E ¥ ar(1 — a) ey 
i h(i — a@ + 8) 


Tk + 1) 


(= Ar(I — ads” a. 
r(l — a+ 8) 


Using Karamata’s Tauberian theorem [12], we get 


[ - am(t) ~ {1 — (1/a)eP(1 — @)s*]P(k + 1) 
0 
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r(k + 1)¢* bait 
{ATC — a)|//P(l — a + B)}*Tl Bk + 1)’ 
r(k + 1) ae oT 


~ PBR + 1) pe 


m,(t) nw 


(6.9) 


where K(a, 8) and 8 are given by (6.3) and (5.2). That is 
k 
' . r(k + 1) 
6.10 E = ¥ | —————————— : 
— nts 8)|/r T'(Bk + 1) 


Hence 


— = & _ 
(6.11) lim Pr were Br < | g8(x) 


where g8(x) is defined by (6.2). 


7. Type I counter. For a type I counter, U(x) = H(z). So y(s) = r(s) W’(s) 
where W'(s) = f¢ e “dH(z). Assuming that m = f¢ zdH(zx) exists and F(t) 
has the same form as in Theorem 5.1, we find that 


y(s) = r(s) W'(s) 
(7.1) = exp [(—c/a)T'(1 — a)s*]-{1 — sm + --- } 
~1— (ce/a)T(1 — a@)s*, 
Hence in this case, 


ee T(k + 1) 
[ Fi ™~ ((/a)r( — a«)s*}*’ 


Using Karamata’s theorem [12] 


a ; r(k + 1) 
(7.3) m,(t) c/a) Fl — a) Tak + 1)’ 


Hence we have 


THEOREM 7.1. For a type I counter with F(t) as in Theorem 5.1. and an H(z) 
having the first moment, 


Vt 


7 j Re be x - = 
(7.4) a” ene =a) = | g-(2) 
where ga(x) is defined by (6.2). 


8. Actual counter. In the actual counter described in Section 1, 


(8.1) P,(t) = 2d, Q5(t) (pe)? 
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where 


(8.2) => Q(t )(") via 


We can write P,(t) in the form 


P,(t) = > Q,(t)q" + sep - Q,.( 


n=0 n=l 


(pp.)° 
ra >. Q(t 
= (: 1+ 4 ole SE Q a 
| dq 2! dq? n==() : 


= (14 2m 4 4 (pp) d )( -a-@% $e 
(: oT 2 +r at 1 1 —q) dL Fala 


1 — [1 — (q + pp,)] » F,(t)(q + pp)” 


Hence we have the following theorem. 
THEOREM 8.1. With the same assumptions on F(t) and p, as in Theorem 5.1, 
in the actual counter, 


(8.4) P4(t) ~ [c/(adp)]t'*™, t—> o, 


9. Asymptotic behavior of », in the actual counter. By using the same method 
as the one used by Takacs in [9], we can deduce that for the actual counter, 


1 (l — r’(s)) ( Pr cl a ” 
(9.1) y(s) = r(s) | —~ — +— is | e'P,(t) dt 
r’(s) r’(s) | Jo ) 


where r’(s) = (pr(s))/(1 — gr(s)), is the Laplace Transform of 


K(t,qg) = (1 —@) F,(t)q 
r=1 


which can be verified to be a distribution. It can also be seen that 


> Q(t) = 1 — Kmai(t, g) 

j=0 
where K,,4:(t, ¢) is the (m + 1)th convolution of K(t, q) with itself. Taking 
F(t) and p, as in Theorem 5.1, with a = 8 and adp > qe, we find that 
(9.2) y(s) ~ 1 — [(adp — qe)/(ap)|TU — a)s*. 


Hence we have the following theorem. 
THEOREM 9.1. With the same assumptions on F(t) and p; as in Theorem 6.1, 
with a = B and \ap > qe, in the actual counter, 


(9.3) im Pr | — = <x| =g.(z). 
- | os [(adp — ge) (1 — a)| ~ | _ 
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APPENDIX 


Here we give a brief sketch of the proof leading to the results in 5.7. At the 
outset we state a Lemma (the prvof of which is fairly simple) which we use in 
the proof of the accompanying theorem. 

Lemma. Consider the integral 


6 
(1) I - [ er a ce > 7 > @ T> 0. 
0 
Ifp> 1, 


I = [(aT)* ”'"/y]-[T(8 — 1)/y)] + Ki(T, 6)T], 
If 6 <1, 
I = K,(T, 8), 
and if 6B = 1, 


where for all T in the range 0 < T < ~, 


KT, 8) < K;(8), 


Note K;(6) depends only on 6 and not on 7. 
We now prove the following theorem concerning stable laws. 
THEOREM. Consider a distribution function F(t) with Laplace transform 


r(s) = exp {[—cI'(1 — a)/als*}, c>0, O<a<il. 


Then, F(t) can be put in the form 1 — F(t) = t “[(c/a) + e(t)] where 


e(t) < [Kit” (l-—#) } goes (l—a) 4 cs? (l—a@) f- Kit * 4 
9 
(2) Kit a(2—a)/(1—a) 


where a > 4 ora < 3, and 
(3) e(t) < [Kit? + Kit? + Kt" + KZ (log t)t”) 


when a = 3-K; and K‘,i = 1, 2,3 --- are positive constants independent of t. 
Proor. In [6] Mikusifisky has shown that 


g(t) (1/2mi) [ e** ds, (O<t< ~;0 <a <1) 


(1/m)[a/(1 — a@))(1/t) [ ue “ dd, 
0 


where 


= c* (l-a 


(sin ad/sin ¢)*"~® -[sin [(1 — a)¢]/sin 9). 
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Put T = t*" so that t = T~""”’* and 


fs (1a) 
_ (sin ag)“ 


= are ’ 
n(¢) = (sin {(1 — «)}) 


(sin )!/=a) re 


so that u = Tv;(). v:(¢) and v;(@) are continuous and bounded in (0, x/2] and 
nu(d) > O in [0, 2/2]. 
Let 


w(d) = n(r —¢) = go 
w(@) and w’(@) are bounded and continuous in [0, +/2] and w(¢) > 0 in [0, r/2]. 
Let G(t) = ft ue “do so that g(t) = {a/[r(1 — a)é}}G(t). 

We write G(i) = G,(t) + G.(t) where 


wd). 


G,(t) = r | vi(o)e'"™ do. 
0 
Using the inequality 0 < 1 — e* < z,xz > 0, we have 
G,(t) = T If , v1(>) do + K(T)T | 
0 


= T[A + K,(T)T}j 
where A = fj? v,(o)dd and |K,(T)| < A’ forall TinO < T < co, A and A’ 
being constants independent of 7’, and 


G,(t) = T [ " wlevs TH) dy, 
0 


In the following analysis we use frequently functions of the form K,(T, 5), 
A,(T, 6), BT, 8), Cit, 6), Ci(t, 6), C7 (t, 6), Dy(t, 8), Di(t, 8), D7 (t, 8), i = 
1,2,3 --- . These functions which depend upon T7' (or t) and 6 have the property 
that their absolute value will be less than a constant which is independent of 7’ 
(or ¢), but may depend upon 6 for all values of T(ort) inO < t(orT) < o. 

We now write G2(t) = G;(t) + Gs(t) where 


x/2 

G;(t) ot 7 | ti) yl “woe -To~ 11 (1-@) wig) de 
é 

= K;(T, 6)T, 
and 

; ~a0¢i— 

Gi) = TL gM Pw(gyeteP deg, 
0 


We now write G,(t) = Gs(t) + Ge(t) + Gr(t) where 


Gs(t) = T fg w(oye te" dg 
0 


(1 — a)[Tw(o)]/ (l(a) + Ke(T, 6)T] 


5 
Tw(o) | [ee To~1!/ (1—-@) { w(o) —w(0) “— 1] do. 
0 
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Using the inequality |1 — e*| < ze'*!|z > O we get 
5 
| Ge(t) | = Ky(T, aT | gi 2/ a) 5 rag 1 (1a) de. 
0 


Here a can be equal to w(o)/2 and 4 is chosen such that in the interval 0 < o <4, 
[w(o) — K¢] = w(o)/2, 


K being the lowest upper bound of |w’(¢)| in0 < ¢ < #/2. 
Using the Lemma we have 


| Ge(t) | = [K3(T, 5) + K,(T, 6)T)T"** 
G;(t) =T [ gh VOM ay (ge Po oma) do 
0 
where w;(¢) = [w(¢) — w(o)]/¢ is bounded. So 
8 
Gi(t) = Kyw(T,a)7 [ gmetwe to ag 
0 
where a’ is the L.U.B of w(¢) in [0, 2/2], which is positive. Hence 
(Ku(T, 8)T*** + Ky(T, 8)T* if 1/(1— a) >2ore 
Gi(t) = < Kn(T, 6)T if 1/(l1— a) <2orea 
\Ku(T, 6)Tflog (1/T) + Ky(T, 8)] if 1/(1 — a) = 2ora 


Collecting the various terms we can express G(t) in the form 


1 
2 
i 
2 
1 
2: 


= (1 — a)I(a)[w(o)T}* + Ai(T, 8)T + As(T, 6)T?+ 
A,(T, 6)T** + A,(T, 6)T** + A,(T, 8)T***, 
ifea>} or a<} 
and 
G(t) = (4)1(4)[w(o)T} + BT, 6)T + Bo(T, 8)T? + B,(T, 8)T! + 
B,(T, 6)T log (1/T), ifa = }. 


(6) 


That is 
G(t) = (1 — a)T(a)[w(o)!t* + Cy(t, a -? 
(7) C(t, gr teee a. C4(t, (22-02 
Cit, 6) 2229-9 4 Ot, 8) 47 Peale) 


G(t) = (V/2)[w(o)]'t* + Dy(t, 8)t* + Det, 8)? + 
D;(t, 5)t*? + Dy(t, 6) (log t)/t 
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—l1/(l-a@ 


g(t) (a/x)t *"[w(o)] “T(a) + Ci(t, d)t 
(9) Carer fair err 
Gi Ger rrr ae ren” 82 
where 
C(t, 6) = (a/[x(1 — a)])Ci(t, 8), 
and 
g(t) = (AvV/x)[w(o) ft? + Diz, 6)? + Dit, 6) 
oT + D3(t, )t' + Di(t, 5)t log t ifa = } 
where 
Di(t, 6) = (1/r)D,(t, 8) 


If e is the Laplace transform of g(t), then 


g(t) = (1 ani) | e*™ ds. 


The frequency function whose Laplace transform is e ™" is given by 
(11) f(t) = (1/m"'*)g(t/m"*) 


In our case for the distribution considered in (5) m = [eI'(1 — a@)]/a. 
We first consider the case when a < 4 or > 3, (0 < a < 1). Using (9) and 
(11) f(t) can be put in the form 


f(t) = at - C” (t, 8)t7 (l—a) + 
(12) ot oer (l—a) a cus?" (l—a) 
a Tar 


Now 
1 — F(t) - | f(t) dt. 


Using (12), after integration, F(t) can be put in the form 
(13) 1 — F(t) = (c/a)t * +t “e(t) 
where 
e(t) < xs (l—a) + no (l—a) 7 ere fp 
(14) r! —a xt —a(2—a)/(1—a) 
Kat + Kst ~ 


where Kj are constants independent of t for 0 <t < ~. 
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In the same manner, where a = 3, using (10) and (11), f(t) can be put in the 
form 


f(t) = cf? + Di (t, 6)t? + DI (t, 8)t* + DF (t, 8)t* + 


(15) 
Di (t, 6)t*flog t — (4) log m] 


Using (15), after integration, F(t) can be put in the form 
1 — F(t) = ct + te(t) 
where 
(16) e(t) < Kit? + Kit* + Kgt™ + Ki (log t)/t 


where K?, i = 1, 2, 3, 4 are constants independent of t in 0 < t < . Hence 
the theorem. 
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THE TRANSIENT BEHAVIOR OF A SINGLE SERVER QUEUING PROCESS 
WITH RECURRENT INPUT AND GAMMA SERVICE TIME! 


By Lasos TaxAcs 
Columbia University 


1. Introduction. Let us consider the following queuing process: Customers ar- 
rive at a counter at the instants to, 71, °°: ,7n,°** Where the interarrival 
times 6, = Ta41 — Ta (n = 0,1, --* 3 ro = 0) are identically distributed, mu- 
tually independent, positive random variables with distribution function 
P{6, < x} = F(x). We say that {r,} is a recurrent process. The customers will 
be served by a single server. The server is idle if and only if there is no customer 
waiting at the counter, otherwise the order of the services is irrelevant. The 
service times are identically distributed, mutually independent random variables 
with the distribution function 


m—1 


(1) H(z) « fy => ¢* add ifz = 0, 


eo : 
\0 a2 <Q, 
and independent of {r,}. 

We are interested in the investigation of the stochastic behavior of the queue 
size and the busy period of this process. We shall see, however, that if we know 
the stochastic behavior of the process defined below, then that of the above 
process can be deduced immediately. 

To define the second process let us suppose that customers arrive at a counter 
in batches of size m at the instants ry), 71, -*- , Tn, *** , Where {7,} is the recur- 
rent process defined above. There is a single server. The server is idle if and 
only if there is no customer waiting at the counter, otherwise the order of the 
services is irrelevant. The service times are identically distributed, mutually 
independent random variables with the distribution function 
l1-—e”™ if x 
0 if x 


(2) H(z) = { 


and independent of {r,}. 

Denote by ¢(t) the queue size at the instant ¢, i.e., £(¢) is the number of cus- 
tomers waiting or being served at the instant ¢. We say that the system is in 
state E, at the instant ¢ if &(t) = k. Further define é, = &(7, — 0), ie., & is 
the queue size immediately before the arrival of the nth batch (n = 0, 1, ---). 


Received October 20, 1959; revised May 15, 1961. 
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Research under Contract Number Nonr-266 (33), Project Number NR 042-034. Reproduc- 
tion in whole or in part is permitted for any purpose of the United States Government. 
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If we identify the arrivals of the batches of size m with the arrivals of indi- 
vidual customers and the total service time of a batch with the service time of 
an individual customer then the second process reduces to the first one. For, the 
distribution function of the total service time of a batch in the second process is 
equal to H,,(x), the mth iterated convolution of H(x) with itself. 

If we consider the first process then the busy period follows the same probabil- 
ity law as in the second process, but the queue size will change to 
[(é(t) + m — 1)/m). 

RemaRk. If we suppose in particular that the batches will be served in the 
order of their arrival and if 7(¢) denotes the virtual waiting time at the instant t, 
i.e., the time which the first customer in a batch would wait if the batch joined 
the queue at the instant t, then we have 

&(t) 


where {x,} is a sequence of identically distributed, independent random variables 
with distribution function H(z) and independent of £(t). In this case the waiting 
time in the first process follows the same probability law as in the second process. 

In what follows we shall consider only the second process and determine the 
stochastic behavior of the queue size and that of the busy period. 

The asymptotic behavior of the queue size and that of the waiting time have 
been investigated already by F. Pollaczek [4], Chapter 7, D. M. G. Wishart [6], 
and F. G. Foster [2]. The stochastic law of the busy period has been given by 


B. W. Conolly [1] and it can be deduced from a general theorem of F. Pollaczek 
[3]. 


2. An auxiliary theorem. Denote by 


otahon [ e* dF (2) 


the Laplace-Stieltjes transform of F(x) and let 


a= [ zar@. 


Throughout this paper we use 

Lemma 1. If (a) R(s) = 0, |w| < 1 or (b) R(s) > O, |w| S Lor (c) wa > m 
and R(s) = 0, |w| S 1 then the equation 
(4) 2” = we(s + w(1 — z)) 


has exactly m roots z = y,(s, w), r = 1, 2, --- , m, in the unit circle |\z| < 1. We 
have 


J j,,,4/m F-1 jim 
(5) 7,(8,w) = oe (per) “wl (2 [o(s + or) 


j=1 7! ds’ 


where «, = e°7"?'" r = 1,2, +++, m, are the mth roots of unity. 
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Proor. In cases (a) and (b) we have |we(s + u(1 — z))| < (1 — e)” if 
jz] = 1 — e and e is a sufficiently small positive number. In case (c) we have 
g(we) < (1 — e)” if € isa sufficiently small positive number. For, if 0 S « S 1 
then g(ue) and (1 — e)” are monotone decreasing functions of e, they agree at 
e = 0 and their right-hand derivatives at « = 0 are —ya and —™m respectively. 
Hence |wo(s + w(1 — z))| S o(ue) < (1 — e€)” if |2| = 1 — € and e is small 
enough. That is in each of the three cases |we(s + u(1 — z))| < (1 — e)” if 
\z| = 1 — e and e > O is small enough. Thus it follows by Rouché’s theorem 
that (4) has exactly m roots z = y,(s,w), r = 1,2,-+--,m, in the circle 
lz] <1 —eor 


z = e[we(s + u(1 — z))}"", 


has exactly one root z = y,(s, w) in the circle |z| < 1 — e. The explicit form 
(5) of y-(s, w) can be obtained by Lagrange’s expansion. (Cf. e.g., E. T. Whit- 
taker and G. N. Watson [5] p. 132.) This completes the proof of the lemma. 

We note that the roots z = y,(s, w), r = 1,2, --- , m, of the equation (4) 
are regular functions of s and w and by analytical continuation they can be de- 
fined also in case pa S m for R(s) = O and |w| < 1 without changing (5). We 
have always |7,(s, w)| S 1, r = 1,2, ---,m, if R(s) = O and |w| < 1. Note 
also that (4) has at most one root (possibly multiple) on the unit circle |z| = 1, 
namely z = 1 is a root if we(s) = 1. Furthermore y,(s, w) = 0 if and only if 
w = 0. If w ¥ O then the roots 7,(s, w), r = 1, 2, --- , m, are distinct. 

We remark further that by forming the Lagrange expansion of [y,(s, w)]*, 
r = 1,2, --- , m, we can prove that 


m Jf mj—k © 
(6) > [v-(s,w) ls =k > Try & e ete me* OF (2), 
=== ! Jo 


oat izkim j( 


where F(x) denotes the jth iterated convolution of F(x) with itself. By using 
(6) we can obtain explicit formulas for the probabilities considered in this paper. 

Finally, we introduce the following abbreviations: y,(s) = y,-(s, 1), g-(w) = 
yr(0, w) and w, = y,(0, 1). They are the roots in z in the unit circle of the 
equations 2” = ¢(s + w(1 — z)), 2” = we(u(1 — z)) and 2” = ¢g(p(1 — z)) 
respectively. 


3. The transient behavior of {¢,}. Define a~ = max (a, 0). It is easy to see that 
(7) En41 = [En > = Val, 


where { v,} is a sequence of identically distributed, mutually independent random 
variables with the distribution 


@ ( 2 
(8) e* “ dF (x), j = 0,1, 


' 


Accordingly the sequence of random variables {£,} forms a homogeneous Markov 
chain. We say that the system is in state E, at the nth step if —&, = k. 





TRANSIENT BEHAVIOR OF SINGLE SERVER QUEUE 


The higher transition probabilities 
Pie = Pl& = k|& = i} 
can be obtained by the following 
TueoreM 1. Jf |z| S 1, |w| < 1, and |y| < 1 then we have 
oo oo oe 4 : . m 1 =P -(w) 
(l-y(— (yt = (i — te) ) 
y)\ w) 2 2, pity’ I] 1 — 4,(w) 


a y(1 — z)(1 — w) ‘ (Y _ ew} 
(1 — zy)[y™ — we(u(l — y))] rai \L — 2g,(w)/’ 


where g,(w),r = 1, 2, --- , mare the m roots in z of the equation 
(10) 2" = we(u(1 — z)) 


in the unit circle \z| < 1. 

Instead of proving this theorem we shall prove the more general Theorem 2 
from which Theorem 1 can be deduced as a particular case. Theorem 2 deter- 
mines the joint distribution of 7, and &, which we need at the investigation of 
the stochastic law of the busy period. Theorem 2 can be proved in exactly the 
same way as the more special Theorem 1. 

The joint distribution of the random variables 7, and é, is determined by the 
probabilities 


(9) 


P(r) = Plra S 2, t. = k| bo = i} 


and these probabilities can be uniquely determined by the Laplace-Stieltjes 
transforms 


xiP(s) = e* dP? (2). 
0 


These Laplace-Stieltjes transforms are given by 
TueroreM 2. If R(s) = 0, |z| S 1, |\w| < 1, and \y| < 1 then we have 


(1 — y)[t — wole)] OD ePCe)y'Fw" = TI (: - n(s,0)) 


i=0 k=O n=O rat \l — zy,(8, w) 


____- ya = 2)[t — wo(s)] (¥ — 74(8, w) ) 
(1 — zy)[y™ — we(s + w(l — y))) 1 \L — 27,(s, w)/]’ 

where y,(s,w),r = 1,2, --+ , m, are the m roots in z of the equation 

(12) z” = wo(s + w(1 — z)) 


in the unit circle \z\ < 1. 

Proor. If w = 0 then the theorem is obviously true, therefore we suppose 
that w ~ 0. We shall use only the following theorem of the theory of functions 
of a complex variable: If f(z) is regular for all finite values of z and 

Lim e}-+00 [f(z)/lz\"] = 0, 


then f(z) is a polynomial of degree < k. If k = 1 then f(z) is constant. 


(11) 
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Let us introduce the generating function 


I!” (s, 2) = Do wi?(s)z’ 
j=0 
which is convergent if |z| S 1 and R(s) = 0. We shall show that if |z| = 1 then 


I!” (s, z), n = 0,1, --- , satisfies the following recurrence formula 


+ Do Ci (8) (1 ~ 1), 


j=0 Zz 


(13) 


where for every i and n >>%.5 |C‘?)(s)| <1. 
We have 
IS” (s, z) _ E{e°"*2"" | 


and further 


(14) Tr41 = Tra t+ On 


and 
(15) Ens1 = [En + m — vn)” 
where {6, , v,} is a sequence of independent vector random variables with dis- 
tributions P{é, < x} = F(x) and 

Plyn = j| On = 2} = €™[(ux)’/jl, j=0,1,---. 
By (14) and (15) we obtain (13). The first term on the right hand side of (13) 
is Efe "+z" | &) = a}. To obtain Efe "*+'z2'"*' | & = i} we have to omit 
from this the terms corresponding to the values —, + m — », = —1, —2,-:- 
and take into consideration that £,,, = 0 if and only if & + m— » 3s 0. 
Thus we obtain the second term on the right hand side of (13), where 
Ci? (8) = Plin + m — vn = —j | bo = Ele" |£, +m — vn = —j, bo 
To obtain (13) we also used the relation 

Efe **2-""} = o(s + w(1 — (1/z))) 
if |z| = 1. 

Now let R(s) = 0, |z| S 1, |w| < 1, |y| < 1 and define 


A,(z, s,w) = >> 1$"(s, z)w” 


n=(0 


A(z, 8,w,y) = Dd Adz, 8, w)y’. 
1=0 
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Clearly 


(16) A(z, 8, w,y) = > YD (yew ? 


i=) k= n=O 


and by definition A(z, s, w, y) is a regular function of z if |z| S 1, R(s) = 0, 
\w| < 1, and |y| < 1. If \z| = 1 then by (13) we have 


f+ D YC (aw(a — (1/2) 
A,(z,3,0) = —== 


“1 = wzmg(s + w(l — (1/2))) 


and hence if |z| = 


: (1 — sy)? + > Cis, w,y)(1 — (1/2')) 
(7) AG,u,y) = —~— 5 





where the coefficients 


Ci(s, w, y) = 2 SO (o)w"y 


n=l] i=(@ 


satisfy the following condition 


x 


2 |Ci(s, w,y) | < |w|/(1 — |i) — fy). 
= 

Now let us define A(z, s, w, y) also for |z| > 1 by (17) if R(s) = 0, |wl < 1, 
and |y| < 1. Thus A(z, s, w, y) has singularities only at z = 1/y and at the 
zeros of the denominator of (17) outside the unit circle. These zeros evidently 


agree with the reciprocal values of the roots of (4) inside the unit circle. If we 
define 


(18) B(z, 8, w,y) = A(z, 8, , y)(1 — zy) I (: 2d ) 


¥r(8, w) 


r=] 


then B(z, s, w, y) will be a regular function of z in the whole complex plane. 
Since obviously 


Lim j2}-+20 [B(z, 8, UW, y)/\e)") = 0 
therefore B(z, s, w, y) is a linear function of z, that is, 
(19) B(z, 8, W, y) = Bo(s, W, y) + 2B,(s, Ww, y). 


Bo(s, w, y) and B,(s, w, y) can be determined as follows: We have clearly 


A(1,8,w,y) = > e [o(s)]"y'w" = aa aay 
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and hence by (18) 


1 m 1 
20) B(1,s8,w,y) = ane ): 
( §,W,y fl ine we(s)] I] ( 7r(8, w) 


Further by (17) 


lim..1, (1 — zy)A(z, 8, w,y) = y™/Ly" 
and hence by (18) 


1 1 ‘ y ) 
21) B —, 8, , = l — <- ‘ 
(; = ’) [y™ — we(s + w(l — y))) I] ( 7r(8, w) 


Thus (19) is determined by (20) and (21). Finally A(z, s, w, y) can be obtained 
by (18). So we get (11) which was to be proved. It is to be remarked that in 
the above proof we did not exploit the fact that the roots y,(s, w), 
r = 1,2, ---,m, are distinct. 

Remark. If we restrict ourselves to the case y = 0 in proving (11) then we 
have 


— we(s + w(l — y))]) 


lim),)+«0 [B(z, 8, w, 0)/|z|| = 


i.e., B(z, s, w, 0) is independent of z and thus it is determined by (20). In this 
case we obtain by (18) that 


(22) [1 — we(s)] > > wor (8)z wu" = (+= = 29,9) ) 
r=] 


k=0 n=0 1 — 27,(s, w) 


where y,(s, w), r = 1, 2, --- , m are defined in Theorem 2 
To prove Theorem | let us note that pz’ = iz’(0) and thus if s = 0 in 
(11) then we get (9). In particular if s = 0 in (22) then we get 


(23) in-we S dfed = 11(1 — gr\ w) ) 


k=0 n=0 roi \1 — 29r (w) 
where g,(w), r = 1, 2, --- , m are defined in Theorem 1. 


4. The limiting distribution of {£,}. Using Theorem 1 we shall prove 
THEOREM 3. If ua > m then the limiting probability distribution 


lim... P{g&, = k} = P,, k=0,1,-:- 


exists irrespective of the initial distribution. We have 


(24) > pe =H] ¢ ~ =) 


= r=] 


where w,,r = 1,2, --- , m, are the m roots in z of the equation 
(25) 2” = g(u(1 — z)) 


in the unit circle \z| < 1. 
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Proor. Since {é,} is an irreducible and aperiodic Markov chain, the limit 
lim»... Pik. = P, always exists irrespective of i and either every P, > 0 and 
{P,} is a probability distribution or every P, = 0. Let i = 0. Using (23) by 
Abel’s theorem we get 


> Pt = lim (1 — w) >> Dor 2’ w r= 1 (#= 
k=0 k=0 n=0 r=] 


If wa > m then {P,} is a proper probability distribution, because |w,| < 1, 
r= 1,2,---,m.If wa S mthen w,, = 1 and therefore P, = 0 for every k. 
Another consequence of Theorem | is 


TuroreM 4. Denote by fio? the probability that in the Markov chain {é,} starting 
from state Ey the first return occurs at the nth step. If \w| < 1 then 


n) om 1—w 
(26) $0 w = 1 —_ a ees 


ee I] [1 — 9-(w)] 


where g,(w), r = 1, 2, --+ , m, are the m roots in z of the equation 
(27) 2” = we(s + w(1 — z)) 


in the unit circle \z| < 1. 
Proor. By the theory of Markov chains it follows that 


Fe te pos wc" 
foo w n — ka 


n=] 
(n) 
> Poo w" 


n=() 


and (26) can be obtained from (23) with z = 0. 


5. The determination of F‘!) (x). Let 


oy 


(x) = Pir, S27, & = k,i1>0,---,& >0|o& = ¥. 


The Laplace-Stieltjes transform 


#!(s) [ e* dFSP (2). 
0 


is given by 


TueroreM 5. If R(s) = 0 and |w| < 1 then we have 


a cm = a (> mio’ (8) w” \= xis'(s)u") 
(28) t (s)w" = ri (s)w —A= = 


n=1 n= 





1+ a rio. (8) w" 


where the expressions on the right hand side are defined by (11). 
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Proor. By the theorem of total probability we get 


PSP (2x) = sd FS (zx) + H [ Pi- a te y) dF$)(y) 


and forming Laplace-Stieltjes transforms we have 


mip (s) = ®(s) + = wie? (s)®9(s). 


Hence 


o 


(29) >> x$f(s)w" = . #S?’(s)w" + (= nil'(s)u")(S {9"(a)u"). 


n=1 n=] 


If k = 0 in (29) then we get (28) for k = 0, whence (28) follows for every k 
by (29). 

By (28) and (11) we conclude 

TuHEeoreM 6. If R(s) 2 0 and |\w| < 1 then we have 


(30) yd of (s)2*w" - It ( 1 )- _1 — wo(s) 


k=0 n=l r=) 1 ee 27,(8, w) II (1 = % (s w)] 
r=] , 


where y,(s, w), r = 1, 2, --- , m, are defined in Lemma 1. 
In particular if z = 0 in (30) then we have 


(31) S Piet «i = ee 
mc I [1 — v,(s, w)] 


Remark. If Foo(x) denotes the probability that the distance between two 
consecutive transitions Ky) — E,, is S x, then we have 





Foo(z) = : Foo (x), 


and if &o(s) denotes its Laplace-Stieltjes transform then by (31) we get 
_1—¢(s) 


’ 


[ [1 — 7,(8)] 


where ¥r(s) _ v-(8, 1), . i, 2, » eee 


6. The probability law of the busy period. Denote by G,(x) the probability 
that a busy period consists in serving n batches and its length is < x. Write 


rts) = [ & dG,(z) 


if R(s) = 0. We shall prove 
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TueroreM 7. If R(s) = 0 and |\w| < 1 then we have 


(1 - aoe) 


where y,(s,w), r = 1,2, ---+,m, are the m roots in z of (4) in the unit circle 
lz] < 1. 

Proor. If w = 0 then (33) is evidently true. Thus we suppose that w = 0. 
By the theorem of total probability we can write that 


| Ge) 
(33) SS Piledel an 8 ta ene 


Gi(z) = of [1 — F(y)]e™ (te vs 


and if n = 2,3, --- , then 


a = f” 1a—» ne —wy (uy) 
G,(z) = Df For "(x — y)[1 — F(y)le Fta-ni 


k+m—1 


Hence 


r4(s) i [ e utes (er [1 oe F(z)] dz 


pr)" 


aie . (n—1) —tutee NBT) 
r,(s) = wD, Poi (s) [ aie =i)! 


if n = 2,3, --- . Forming the generating function of {T,,(s)} we get 


[1 — F(x)] dx 


(34) DS Ta(s)eo" = ww D Cal, w) [ gets ies, 1 — F(x)] ae 


n=1 
where C,(s, w) = Oif k = 0,1, --- ,m — 2; Car(s, w) = 1 and 
Cr4m—1(8,W) = D267 (s)w", k=1,2,---. 


Thus by (30) we have 


(35) > Cils, w)2e* = 11 (; ie): 


k=0 r=) 


For fixed s and w write 


1 — z7,(s, w) 


C.(8,w) = a f(z) dz 


ori lejar ze? 


, 
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and by (34) we get 


Were crane) 
37) r,(s)o” = (z) ii ea ae. 
- 2, te) 2m ¢ J z[(u + s)z — pl 
We can integrate term by term because the series is uniformly convergent on 
the circle |z| = 1. Now the integral on the right hand side of (37) can be evalu- 
ated as —2zi times the sum of the residues of the integrand at the poles 
z= 1/y,-(s, w), r = 1,2, ---,m, outside the unit circle. The residue at 
z = 1/y,-(s, w) depends on the value g(s + uw(1 — (1/z))) at z = 1/y,(s, w), 
but if z2 = 1/y,(s, w) then 
¢o(s + w(l — (1/z))) = 1/we”. 
Accordingly (37) remains unchanged by the substitution 
o(s + w(l — (1/z))) = 1/we”. 


Hence 


= 1, > (w — (1/z”)) 
I n ) = . ) 2 ren dz. 
” Ere = £$ MO gee ha 
On the other hand this integral can be evaluated as 277 times the sum of the 
residues of the integrand at the poles z = 0 and z = u/(u + 8) inside the unit 
circle. Proceeding in this way, we get 


Brine 1 [CY ~bGed) 


where f(z) is defined by (36). This completes the proof of the theorem. 

We remark that the above proof would be also valid in the case of multiple 
roots 7,(s, w), r = 1,2, ---,m. For, if z = y,(s, w) is a root of order v then 
the residue of the integrand in (37) at z = 1/y,(s, w) depends on the values 


d'o(s + w(l — (1/z))) _ d'[{1/we™) 
a ae 
at z = 1/7,(s, w). 
Remark. Denote by G(x) the distribution function of the length of the busy 
period and let I'(s) be its Laplace-Stieltjes transform. Evidently I['(s) = 
a ew I’,(s) and therefore if w — 1 in (33) we get 


(39) r(s) = 1 — —__— =f ; 


(0) 
» @ 


where 7,(s) = yr(s, 1), r = 1, 2, --- 
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The probability that a busy period consists in serving n batches is fj’ = T,,(0) 
and therefore by (33) 


> fio w" = _ a a _w) - 
dis I [i — 9-(w)] 


(40) 


in agreement with (26). 
THEOREM 8. Denote by Poo(t) the probability that the server is idle at the instant 
t given that he was idle at t = 0. If R(s) > O then 


8 ‘s{1 — ¢(s)] r=l 


-CEy a 
(41) I o“Palt) do? L uts II a 
0 ‘ 


7 a. yr(8) 


where y,(s), r = 1, 2, --- , m, are the m roots in z of the equation (4) in the unit 
circle. 

Proor. Clearly Po(t) = P{&(t) = 0| (0) = O}. Denote by M(t) the ex- 
pectation of the number of transitions Hy) — E,, occurring in the time interval 
[0, ¢], given that £(0) = 0. Then we can write that 


(42) Po(t) = 1 -| [1 — G(t — x)] dMo(z), 


where 
M(t) = I(t) + Foot) + Foo(t) * Foo(t) + --- 
and /(t) = lift 2 0, 7(t) = Oif t < 0. Since 


ne ad 1 
> dMa(t) = ——— ; 
f . ool?) 1 — Bo(s) 


we get by (42) that 


“ —st a 1 as l at I'(s) 
I e"Puo(t) dt = - [1 ie ra | 
where 2o(s) is defined by (32) and ['(s) by (39). 
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EFFICIENT ESTIMATION OF A REGRESSION PARAMETER FOR 
CERTAIN SECOND ORDER PROCESSES! 


By Cuaruotre T. STRIEBEL 
Lockheed Missiles and Space Division, Sunnyvale, California 
0. Summary. The problem of estimation of a single regression parameter for 
a process with fixed known regression function and unknown covariance is 
attacked using a Hilbert space representation of the process. Some general 
results are obtained which characterize efficiency classes of covariances—that is, 
classes for each of which there exists a single estimate that is efficient for all 
members. These results are applied to both the discrete parameter and the con- 


tinuous parameter stationary process with rational spectral density. Some special 
results are also obtained concerning the efficiency of the least square estimate. 


1. Introduction. Let x(t) be a second order complex-valued process with mean 
value function zero and covariance 


(1.1) Efx(t)x(s)] = R(t, s), 


and suppose that the process 


y(t) = ke(t) + x(t) 


is observed for the parameter ¢ in a subset C’” of the real line. The function ¢(t) 
is known, and the parameter k is to be estimated. The subsets of interest will be 
the intervals (—«» < ts T) and (0 S$ t S T) for the continuous parameter 
process and the integers (t = T, T — 1,--- ) and (t = T, T — 1,--- , 0) for 
the discrete parameter process. 

A linear unbiased estimate with finite variance will be represented as a linear 
functional 


(1.3) k* = kly(t), te C’), 
which is the limit in quadratic mean of unbiased finite linear combinations of 
the y(t) process, that is, 

Mm 


(1.4) >, kin y( tin) > &* asm — @, 


where 
(1.5) eC" 


and 


Mm 
(1.6) 2 krne( tim) = 1. 
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The limit &” of (1.4) is a random variable with finite variance. It can be thought 
of as an element of the L2 space over the underlying probability space, it can be 
made to correspond to an element in the reproducing kernel Hilbert space defined 
by the kernel R(s, t) (see Parzen [10]), or a correspondence can be set up with 
elements of another L, space as will be done in Section 2. However, it seems more 
appropriate to use the notation of a linear functional (1.3), since an estimate 
must finally be reduced to this form so that it can be applied to elements y(t) 
of the sample space. Thus the notation &” will always refer to a particular se- 
quence of coefficients {k7,,} and time points {t/,,} satisfying (1.4)—(1.6), and the 
expression k"[f(t), t¢ C”] will indicate the limit in the topology of the range 
space of f(t) of the sums >>; k7,f(t7,) provided this limit exists. 

Since only linear unbiased estimates will be considered, and the criterion by 
which an estimate will be judged is its variance, it is clear that only second order 
properties are involved, so that for these purposes the estimation problem is 
completely determined by the pair (R,¢). An estimate k” is said to be asymptoti- 
cally efficient or simply efficient for the problem (R, ¢) provided 


in sini variance k” 
(1.7) E(T) = - : 


as T —> ~, 


: -T 
variance k 


where k’ is the minimum variance unbiased estimate of k for the process (1.2) 
with t ¢ C’. E(T’) will be called the efficiency for the problem (R, ¢). 

Interest in efficient estimates arises from the fact that the “best” estimate 
k” may be very inconvenient. This estimate is determined by the linear equation 


(1.8) k7(R(t, s), te C’| = M"g(s),s eC’, 


where M” is a constant. For many problems of interest, the solution to this 
equation is difficult to exhibit explicitly, and provided it can be computed at all, 
it will depend on complete knowledge of R(t, s). Thus, if the function ¢(t), 
which will be called the regression function, is known, but information concerning 
the covariance is limited or can be obtained only at considerable expense, it is 
desirable to find an estimate that is economical of information concerning R(t, s) 
in that it is efficient for as wide a class of covariance functions as possible. 

The principal estimate that has been proposed is the least square estimate 
given, for example, by 


e T 
(1.9) ki - | o(t)y(t) a / [ \o(t)|” de 
0 0 


for the case C” = (0 S t S T). This estimate has the advantages that it is easy 

to compute and requires no knowledge whatever of the covariance. Previous 

work on the problem of efficient estimates has been restricted to stationary 

processes, that is, R(t, s) = R(t — s), and has been primarily devoted to deter- 

mining those combinations (FP, ¢) for which the least square estimate is efficient. 
For the continuous parameter Ornstein-Uhlenbeck process, 


(1.10) R(r) = € Pl", 
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and for regression functions 


(1.11) or ero! 


, 


where r is a non-negative integer and Apo is a real frequency, Mann and Moranda 
[9] proved that the least square estimate is efficient. The author in [13] extended 
this result to include regression functions of the form 


(1.12) g(t) = te 


and showed further that for the more general function, 


n 
(1.13) o(t) = Di gate”, 

a=l 
where the ¢, are non-zero constants, the \, are real and distinct, and n > 1, the 
least square estimate is not efficient. 

For a much broader class of covariance function and essentially the same re- 
gression functions, this problem was first discussed by Grenander in [2]. Further 
work was carried out by Grenander and Rosenblatt in [3] and [4]. Rosenblatt 
considered some of the same problems in the case of vector-valued time series in 
[11] and extended his results in [12]. Most of these results, together with some 
examples, appear in Chapter 7 of [5]. In this work only the discrete parameter 
case is considered, and the regression functions considered are slightly more 
general than those of the form (1.13). All restrictions on the class of covariances 
are imposed on the equivalent class of spectral densities f(\), which by assump- 
tion exist and satisfy the relation 


(1.14) R(t) = 5 [ e™f(A) dd 


for a discrete parameter process and 


«2 


(1.15) Rif) wad 


2a — 0 


e F(X) dd 


for a continuous parameter process. In the discrete parameter case for positive 
continuous spectral density and “‘slowly increasing” regression function, a neces- 
sary and sufficient condition is given in [5] for the least square estimate to be 
efficient. The same theorem is obtained in [13] for the continuous parameter 
Ornstein-Uhlenbeck process and regression function of the form (1.13). Theorem 
4 in Section 3 extends this result to the continuous parameter processes with 
rational spectral density. 

In Chapter 1.3 of [6] Grenander and Szegé reproduce a few of the results of 
[5] using the methods of Toeplitz forms. In Chapter 1.4, under certain regularity 
conditions on f(A), he extends his results to the continuous parameter case for 
the single example 


(1.16) g(t) = 1. 
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With the exception of those in [6], all the above-mentioned results are derived 
for the more general problem 


p 
(1.17) Ely(t)| = 2 kipi(t), 

where the k; are unknown parameters and the ¢;(¢) are known functions. For 
p > 1, the definition of efficiency used by Mann, Moranda, and Striebel is dif- 
ferent from that used by Rosenblatt and Grenander. For the case p = 1, both 
agree with definition (1.7) made above. In the present paper only the case p = 1 
will be considered though it is believed that the results obtained could be gen- 
eralized to larger values of p. 

In Section 2, for a rather broad class of processes, necessary and sufficient 
conditions are given for the existence of an estimate that is efficient for two prob- 
lems (R; , ¢) and (R2,¢). When such an estimate exists, it will be said that R, 
and R, are efficiency equivalent. In Section 3 these results are applied to the prob- 
lem of a stationary process with rational spectral density and regression function 
(1.13) where the A, are complex with 9\. = —a S 0. Both the continuous and 
discrete cases are considered. 


2. Efficiency equivalence. It will be assumed that g(t) and R(t, s) can be 
represented as follows: 


(2.1) R(t, s) | E(t, A)E(s,A) dF(A), 
A 


(2.2) y(t) | £(t, AJB A) dF(a), tC’, 
A 


where £(t, \) is a complex-valued measurable function on R X R, the set 


(2.3) A=U U (\/é(%A) #0) 

T tecT 
is measurable, F is a measure on the subspace (A, ®) of the reals, and ®*(A) is 
in the linear span L’(F) of {&(t, \), te C"} in the Hilbert space L.(A, ®, F). 
Under these assumptions it follows that to each unbiased linear estimate £7 
with finite variance there corresponds an element n*(X) in the subspace L7(F) 
such that 


(2.4) n"(r) = R7[E(A, t), te C7), 
(2.5) 
and 
94 yr . [Tv 
(2.6 variance k 
The function n7(X) corresponding to &” is unique a.s. F. A minimum variance 


7 ‘ oT ° 
unbiased estimate k° exists, 


&"(d) Tres r 
——_—_——_——_——_ = k 
(67, 7) [é(t, dr), teC | 


(2.7) 
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and 

9° aol 57 (xT T 

(2.8) variance k = 1/(®',®°). 

These results are fairly standard and can be obtained for example, from more 


general results by Parzen [10]. 
The cases which will be considered in the next section are 


E(t, 4) = (24) *e™, 


A = [—z, x] for the discrete and A = (—«, ~) for the continuous parameter 
stationary process. The solution @7(A) of the equation (2.2) will be found by 
the Wiener-Hopf technique for C” half-infinite. 

Let F; be measures for which there exists @7 satisfying (2.2). Consider n7;(X) 
in L"(F;) which corresponds to an unbiased estimate k; for ~ problem (R; ,¢). 
The following measures can then be defined: 


(2.9) ui (B) = / \@7 (A) |? arn) / [ I? (A) |? dF (A), 
B 


(2.10) v7;(B) = i ints(a)P arr) ff \nts(a)P a(n). 
B 


The first subscript on n.; will be omitted when it is clear what problem (F, 
is intended. The efficiency E,;(7') for the estimate nj (X) for the problem (F; , ¢) 
is given by 
1 17 2 oan T 22m 
(2.11 ome =| n;(A)\ dFi(a [ era  dF(). 
) E;,(T) i 3 )| ( ) i )| ( ) 
Lemna 1. If nj (A) is unbiased and efficient for (F;, ¢), then 


:(B) — vi;(B)| 0 as T— o 
uniformly for B ¢ B. 
Proor. The subscripts will be omitted in the proof. Let 


4 
ar = | finan ara J, by = [j@7(A)[? dF(A)}; 


ue 


then 
r r || (A) n™(xr)(? | 
u"(B) — »"(B)| < je — Bt) aro) 
T ‘ T | T 
@ (A) _n | \P A) 2 2 | | dF () 


br ar br 


| T T 2 \2 ; 
ar | . ar | 


br 


2 2 ¢ i 
= {(F a = 20", ~ DNF +3 r+ 2K( 8’, sD) ; 
m a ar br bi. ) 


The first inequality takes the absolute value under the waa the second uses 
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the elementary inequality 
tt 2 | 1] 
| |a} | S ja — Di ja + DO}; 


and the third is the Schwarz inequality. Since the estimate is unbiased, 


and, since it is efficient, 
1/E(T) = (n", n")(®", &”) = azb> > 1. 


r T T . ° ‘ T 
LemMaA 2. Let no and n; be unbiased and efficient for (R, ¢). If vi. converges 
weakly to a measure Ny , 


> w r 
Vv41— N ll » 
T 
then vo also converges weakly to that measure, 
T w r 
Vio —> N ll - 


Y T 
Complete convergence of v1, 


c r 


T 
ii o> ova 
implies complete convergence of vio 
Vi0 = Niu. , 
} 
The terms weak and strong convergence are according to Loéve [8]. This lemma 
is immediate from Lemma 1. 

When it is said that £7 or nJ is an estimate for two measures F; and F; , the 
following is intended: there are sequences {kj,,} and {ti} satisfying (1.4)—(1.6) 
where convergence is quadratic mean in (1.4) holds for both R,; and R, or equiva- 
lently the sequence of functions 
(2.12) n=(X) = >. kimt(tim » ») 

i 
converges to nj (A) in Lo(F,) norm and in L,(F:) norm. 

THEOREM 1. (i) Let A be a countable union of intervals on which dF,(r)/dF 
exists and is continuous except for a countable number of discontinuities. Consider 
a sequence T for which the following are satisfied. (ii) There exist estimates nj (d) 
unbiased and efficient for (F;, ¢) 1 = 1, 2 for which 


vis» Nix, © = 1,2 and Nw(A) #0. 


Then if there exists an estimate nj (> that is unbiased and efficient for F, and F2 , 
it follows that Ny, and No» must satisfy the following condition: (iii) For all B e & 


Pak : 
(2.13) / dP) ay.(x) = eNw(BN A) 
Bf\A dF 





ESTIMATION FOR 2ND ORDER PROCESSES 


where 
J infor a) 
(2.14) im sich 

* fintore dF y(n) 


Proor. Let (a, b) = B* be an interval contained in A on which dF2(\)/dF; 
is continuous, then 


[ 4F2(d) 


he ae dvio(X) = c(T) v20(B 2 


where 


T 2 
7) - J noo? ar dF, _ Ew(T)En(T) J nfo? ar dF .(r ’ 


aa T)EwW(T) fF. 
J inZov? ap, PatT)Eu(7) fi ni (r)|* dFy(A) 
From Lemma 2 and (ii), since n@ is also efficient for F; 
vio —> N ai? 
By the Helly-Bray Lemma 


dF,(X) 
B* dF; 


Since No(A) # 0, there exists an interval B* in A such that N..(B*) ¥ 0, 
thus 


- dvio(X) =f. wa dNuld d). 


T 2 dF2(X) 
[ inZa0) al _ &(T)Ew(T)Ex(T) _ Jpe dF, dNu(a) 


———$$<<—$5 —_________— ——— >C = — — 
Ex»(T)Ew(T Nw(B* 
[ infor dF, 20( 7) Eu(T) 22o(B*) 


The measurable sets in A are generated by intervals of this type, so (2.13) must 
also hold for all B C @. 

TuHEeoreM 2. If in addition to assumptions (i)-(iii), c # 0, A = A, dF2/dF, 
is bounded and 


T 
11 Ny , 


then nj (Xr) efficient and unbiased for F, implies nj (d) is also an efficient unbiased 
estimate for F 2. 

Proor. Let {k7n}o, {tim}o be a sequence of simple estimates (1.4) which con- 
verges to kj in quadratic mean, with respect to F; . Then for the corresponding 
{nZ(X)} given by (2.12) 
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‘ "(X) . si 
/ ing (A) — na (aA)! dF. = [=> Ino (A) — ne(r))’ dF,(A) 
e C 1 


< u | ns (X) — nn(r)|’ dFi(A) > 0, 


where M is the bound of dF,/dF, . Thus (1.4) also converges to an estimate which 
? . ’ 
corresponds to mo (A) with respect to F2 . 


BT) ~ El T)ERAT) f 


= dvi 


dF, 


T 2 . 
) 1 . 
] Ey(T) ma = aF | dk’, 
»9 ( 


n (dr) , dF, 
in 3 {= - dNu(d) = Nw(A) < 1. 
c dF; 


This depends on Lemma 2 and the Helly-Bray Theorem in Section 11.3 of [8]. 
3. Rational spectral density and regression function. In this section the discrete 
and the continuous parameter stationary process will be considered. Thus 


‘ ‘ —h idt 
a? : = (29) ‘e, 


(3.2) 
for the discrete parameter process, and 
A 
for the continuous parameter process, and the representation (2.1) is given by 


‘ ” ° Tm T eo . ° 
(1.14) and (1.15), respectively. The case of C” half-infinite will be considered 
first. For the discrete parameter ¢ and T are integers, and 


(3.4) Cc’ = (T,T —1,-:: ); 


, 
for the continuous parameter 
9 - yT 7 
(3.5) C" = (—o0,T). 
It will be assumed that the spectral densities f(z) and f(\) are positive rational 
functions where for convenience in the discrete parameter case the density will 
. r rn “,* 
be treated as a function of z = e*’. The densities can be factored 
(3.6) f(z) = \F(z)/, 
= \ ’ 2 
(3.7) f(A) = IFO). 


For the discrete process F(z) is a quotient of two polynominals each of the 
same degree and having zeros inside the unit circle (\z| < 1); for the continuous 
process F(\) is a proper rational function and has poles and zeros in the upper 
half-plane (9A > 0). (See Doob [1], p. 502 and p. 542.) 

The regression function that will be considered has the form 


(3.8) g(t) = > ote 


y=1 
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where \, is complex and 
(3.9) max, ®(iA,) =a 2 0 and g, #0. 
The exact form of g(t) for ¢ < 0 will be seen to be immaterial for questions of 


efficiency as T — «. For the discrete parameter case g(t) for t < 0 must be such 
that the sum 


(3.10) @(z) = >> z‘g(t) 
t==—oo 
converges to a rational function in a ring 
a < lz| < b. 


Similarly, in the continuous case the integral 


(3.12) @(\) = | e y(t) dt 


must converge to a rational function in the strip 
(3.13 ) —-b< 9X < — a. 


In this case it will also be assumed that ®(\)/F(\) is a proper rational function. 
For any given degree e and g(t) given by (3.8) for t 2 0 it is always possible 
to define o(t) for t < 0 so that the degree of denominator of ®(\) exceeds that 
of the numerator by e and hence ®(\)/F()) is proper if the net degree of 1/F(\) 
is less than e. In each case the terms of importance in (3.8) are those for which 
g(iX,) = a and among these the ones for which r, is a maximum. The index of 
these terms will be indicated by a = 1,---, n. The functions ®(z) and #()) 
can then be expanded as follows: 


r 


‘ gir: " 
(3.14) ai 27 


a=] j=0 (z _<7 Za) itt , 


where 
(3.15) 


and 


(3.16) Pan ™ , = t! Ga; 
3.17) &(r i i OS ; 
(3.17) (A) Lean 
Sa —d, 
(3.18) 
Sr or xs —b, 


(3.19) Par = B, = ri ga(—i)™. 
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Equation (2.2) can be written 


(3.20) o. 2a b 2’ "@"(z)|F(z)|’ dz = g(t), 
(2mr)*2 J\z 


=1 


where 
T 


(3.21) @7(z) = >) zk, 


t=—oo 


| 


- e™OT(X) (F(A)? dd = g(t), —«o <t<T, 


where 


At 


(3.23) @7(\) = k’[e™', —0e <ts T). 


Under the assumptions made these equations can easily be solved by the Wiener- 
Hopf technique. (See, for example, [14] p. 313.) Solutions are given by 

T 
| 


(3.24) o*(s) = —_.. 
1(2r)* F(z) t=—« 


and 

(3.25) 

where 

(3.26) Osa<c< b. 


Equation (3.23) can be written as an integral 


T 
(3.27) @7()) = | eK" (t) dt 


if K"(t) is permitted to include delta functions and their derivatives. Formulas 
for the estimates themselves will be given later. 

For the case of C” half-infinite the “best” estimate k’, which is clearly efficient, 
will be considered. For this estimate n”(X, is given by (2.7). For a given spec- 
trum f;, the measure N,; and an asymptotic expression for f\nz;(A)\*f:(A) dd 
must be obtained in order to apply the theorems of the previous section. These 
‘an be obtained by a straightforward but somewhat lengthy calculation and 
will be given without proof. 

Lemma 3. For the discrete case,a > 0 and a sequence T — ~ such that e 
(@", 6”) \ Ga Gp lalp Za Zs 


——_— -—c(F) = nen > 0, 
eet Tr x 2, F (2a) F (2g) (2223 — 1) 


iTRra _, l 
aq) 


(3.28) 
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and 


j &'(z)F(z)) 
(3.29) he rate p32 
(@7, 7) Sa |: 


i F(z fe. fa. = 
For the discrete case, a = 0 and all sequences T > ~ 
(®", @' es l : 


(3.30) c(F) = 1 >0, 
pr * rai &lre > 


and 


1 &'(z)F(z) r 1 = | ¢g ? ; 
(3.31) ee eee a | 8s — @. 
2miz (67, 67) c(F)(2r + 1) p> \F(Za) Ge) 


where 6 is the Dirac delta function. For the continuous parameter case,a > Oand a 
'p , iTRra 
sequence T — « for which « — ll, 
T T 
(@ ,@ ) im =~ <= Gas lals 


(3.32) —,,- > c(F) = — ——— > G, 
rr 22 Na) FAs) (ia — ids) 


and 

®'(X) F(A)! « 1 | Yala 
3.33) eae A a 
(3.33 (67, 6") Gain |21 Fado 


For the continuous case, a = 0 and all sequences T > ~ 


T aT | 2 
oF). (F) = Diet >, 


— es Qr - ; -1) & |FQ.)| 


and 
n 


Se Je" FOA)S l | ee 
(3.35) Ge)  aMGr a Se ar — 





where 6 is a delta function. 

All integrals involved here can be evaluated by contour integration in the 
complex plane. Simplifications occur due to the fact that terms contributed by 
poles at the z, and A, for a = 1, --- , n dominate all others of $(z) and (A) 
as well as those of 1/F(z) and 1/F()). 

THEOREM 3. 

(i) For a > 0 and a sequence T — ~ for which e — 1, , there exists an esti- 
mate efficient for spectral densities f,; and f. if and only if 


iTRAaq 


= Pala Za 
(3.36) Fa(2)| _ e(Fi) | p Fi@a)(@a — 9) 
IFi(z)| (Fs) | 2 Palate | 


2 Fie)ea — 2) 
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for the discrete parameter process, and 


- Gale 
F,(r)| _ (Fi) 2 FeO. =) 
FON a 


=i Filta) (Xe — 2) 


for the continuous parameter process. 

(ii) If this condition is satisfied, then any estimate that is efficient for one is 
also efficient for the other. 

Proor. Under the assumptions of this section dF2(\)/ dF, = feo(A)/fi(A) is 
continuous and A = A. For the discrete process fo(z)/fi(z) is always bounded; 
for the continuous process condition (3.37) implies that fo(\)/fi(A) is bounded 
above and away from zero. In both cases c = c(F,)/c(F2) ¥ 0. Thus Theorems 
1 and 2 apply. Expressions (3.36) and (3.37) can be obtained directly from 
(2.13) by substituting the appropriate forms from Lemma 3. 

THEOREM 4. 

(i) For a = 0 and any sequence T — ~, there exists an estimate efficient for 


fi and fo if and only tf 


(3.38) fore) = CFD 


— - (>. ) 
J2 c(F») fi Aa) 


for both the discrete and the continuous parameter process. 

(ii) For the discrete parameter process if (3.38) is satisfied, then any estimate 
that is efficient for one is efficient for the other. 

(iii) For the continuous parameter process if (3.38) is satisfied, kf is an efficient 
estimate for f, , and fo(X)/fi(A) ts bounded; then kg is also efficient for fo. 

Proor. As before (3.38) is obtained from (2.13) using Lemma 3, and Theorems 
1 and 2 apply. 

The stronger result of Theorem 3 (ii) is not true in the case a = 0 for the 
continuous parameter processes, since it is possible to find an efficient estimate 
for f; that depends on derivatives of y(t) which will not exist for fo if the degree 
of 1/f is less than that of 1/f, . This is, of course, the case when f2/f; is unbounded. 
However, it is possible to find an estimate that is efficient for all f. satisfying 
(3.38). Such an estimate is given by (3.46). 

The case of C’ = (0,1, --- , 7) and C” = (0, T) can now be treated easily. 
Under the assumptions made on f and ¢, a solution to the equation (2.2) does 
exist for both the discrete and the continuous parameter process. (See, for ex- 
ample, Laning and Battin [7], Chapter 8.4.) However, it will not be convenient 
to use this as the efficient estimates nj; required in the theorems of the previous 
section. Instead, the ‘“‘best”’ estimates for the half-infinite interval will be com- 
puted and truncated. The estimates obtained in this way are of some interest 
and will be given explicitly. For a > 0 and the discrete case let 

1 


(3.39 a ee Sa 
) (2 — cha) F@ dL? Ms-1 ; 
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then 


-iTRAa a 
(3.40) Mk’ = + of T — t) 51 ee ge 


t=(0 a=1 F (za) 
For the continuous case, let 


Il 
(3.41) ———_—$_——— = F,(r M,(A 
(x — Ke) FO) ere 
where E,(A) is a polynominal 
e—1 
(3.42) E.(x) = > fn’ 
j=0 


and M (A) is a proper rational function. Let 


(3.43) m*(t) = =f e'M,(A) da, 
2m Jw 


then 


~iTe “(—4)3 es 


e—) 
Mk => yy" (T) 5 Pee —$<$$__—__—_* 
(3.44) is =: 


+f y(T — t) Pee 


a= 


MTR" = ¥ y(t) Hf 


t=(0 a=! 6 


fea! 


T 

(< n 4a TT.T ae Gate * 
(3.46) Mk y(t) > Fa) 
In all cases M” is a constant to be determined so that the estimate is unbiased; 
that is, M’ is given by the right side of the expression with ¢(t) substituted for 
y(t). A straightforward computation of their variances shows that these esti- 
mates are efficient for the half-infinite problem discussed above for all sequences 
T — o«. Thus by Lemma 2 


vi0 + Nii 
where njp indicates the estimates (3.40), (3.44), (3.45), and (3.46) for f;, and 


N;,,; are the limit measures given in Lemma 3. The asymptotic forms c(F;) also 
hold for the 1/f|n7)(A)|’ dF (A) since 


(87,7) = 1/Eio(T) (nig, nin). 


Thus Theorems 3 and 4 also hold fort = 0, 1, --- , T in the discrete case and 
0 s t S T in the continuous parameter case. 
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The least square estimate for C” half-infinite is given by 


T 
(3.47) Mk = > y(t)e(t) 


t=—o 


and 
- T — 
(3.48) Mk - | y(t) o(t) dt. 


For the disc ‘rete parameter case, F(z) = 1 provides a bona fide convariance for 
which ng (z) for the least square estimate is given by (3.24). Thus from Theorems 

3 and 4 in this case the least square estimate is effic wi for F(z) if and only if 
(3.36) (3.38) hold for F(z) = F2(z) and F,(z) = 1. In the continuous case 
if the “te ast square estimate is effic ient then by Lemma 2 »7; and vj) must converge 
to the same limit. N;; the limit of v7; is given by (3.33) and (3.35) in Lemma 3. 
Ni the limit of vio ¢ can be computed by us use of Lemma 3 and the Helly-Bray 
Theorem, since ng (d) is identical with @7(X) except for a constant where 67(\) 
is given by (3.25) with F(A) = 1. Fora > 0 this limit is 


‘ r sie - Ya L. : F 
(3.49) Nw(B) =e f f0)| Eg] a 


and for a = 0 by 


3.00 Nio(B) = a! S(ra)5(A — Ag) AX. 
(3.50) S Tee Fie) fa Flee 


Thus for a = 0 if the least square estimate is efficient for f(\) it follows that 
(3.51) f(Aa) = constant | ee 


An asymptotic form for {|n¢(A)|*f(A) dd can also be found. 


(3.52) ponte na (nr) |? f(A) dd > (2r +1) DS | gal? FOAWD/AY | Ge [PP 
a=1 a 


From this and (3.34) of Lemma 3 it is clear that (3.51) is also sufficient. Nio = 
N;; for a > 0 becomes 


n 2 


l [ Ya l 
(3.53) (n) | 5. oem ala 
. on | 2 (Xa — A) | =| F (Xa) (Ae — 2d) |» 
but this is not possible since f(\) must be proper. Thus for a > 0 the least square 
estimate is never efficient. 
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TWO SIMILAR QUEUES IN PARALLEL 


By J. F. C. Kineman 
Statistical Laboratory, University of Cambridge 


1. Introduction. Haight [3] has considered a system consisting of two un- 
bounded single server queues, in which a customer, on arrival, joins the shorter 
queue. In the present paper, we make the simplifying assumption of symmetry 
between the two queues, an assumption that enables us to use generating func- 
tions to study the behavior of the stationary solution. 

Thus we assume that the two servers each have an exponential service time 
distribution with unit mean, and that the arrivals form a Poisson process with 
mean 2p. If an arriving customer finds that both queues have equal length, he 
joins either with probability 3. 

We first prove that, so long as p < 1, a state of statistical equilibrium is 
reached. Then the equilibrium equations are converted into an equation for a 
bivariate generating function, by which this function is given in terms of two 
univariate generating functions. These two functions are shown to be meromor- 
phic, and the positions of, and residues at, their poles are found. This enables us 
to express the probabilities as an infinite sum of geometric distributions. It also 
provides us with approximations valid when p is near unity, such as the result 
that the waiting time distribution of a customer is the same as that for a single 
queue with traffic intensity p’. 


2. Limiting behavior of the system. The first problem to be decided is whether 
or not the queue will settle down into a stationary state. Under the assumptions 
that have been made, the lengths of the two queues form a continuous time 
Markov process, and we first prove a lemma referring to these processes in gen- 
eral, giving a sufficient condition for a valid limiting distribution to exist. This 
lemma, which is an extension of a theorem of Foster [2] on the discrete time case, 
is of wide applicability, and it is hoped to publish an account of further exten- 
sions elsewhere. 

We consider an irreducible Markov process X(t), taking a countable number 
of values 7, and we assume that the limits 


qis = limy.o f {P(X(t) = 7 | X(O) = i) — 8:3} 


exist and satisfy the conservation conditions >>; qi; = 0. 
Lemma 1. Let —q; be bounded. Then the limits 


p; = lim,.. P(X(t) = 7 | X(0) = 7) 


exist and are independent of i. The {p;} form a probability distribution if and only 
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if there exist non-negative y; such that Doin qisyi < © for all 1, and 


(1) > gilyi — ys) 21 
jx 


for all but a finite number of 7. 
Proor. Let Y(n) be a discrete time Markov chain with transition probabilities 


gii/Q (¢ # j), where Q > —gqi; for each 7. Then Y(n) is irreducible, and hence 
a Césaro limit p; of 


psy, = P(Y¥(n) =j| Y(O) = 7) 


exists asn — ©, i.e., 


Pi = lim, . |= P|. 
n 0 


Now define a Poisson process N(t) with N(O) = 0, and E{N(t)} = Qt, and let 
X*(t) = Y{N(t)}. Then X*(t) is a Markov process with the same transition 
intensities g;; as X(t), and X*(t) has, with probability 1, only a finite number 
of discontinuities in every finite interval. Hence (see, for example, [1]), 
P(X(t) =7| X(0O) = ¢) = P(X*(t) = 7 | X*(0) = ¢) 

= P(Y{N(t)} =7| Y(O) = 7) 

= 3 (Q)" Bm 

- a oe 

k=O k! 

It follows without difficulty that P(X(t) = 7|X(0) =i)—-p;, ast- @. 
The {p;} form a probability distribution if and only if Y(n) is ergodic. By a 
result of Foster ({2], Theorem 2) this is so if, and only if, there exist non-nega- 
tive y; such that 


7. Q aiii +(1+ Q ais) <a) all 7, 
jx 


and 


zs Q ais +t + QO Guxdyi S ys — 1, all but finitely many i. 
jet 
This is easily seen to be equivalent to the conditions stated above, and the lemma 
is proved. 

In order to apply this lemma to the problem in hand, we have to consider the 
values of the g;; for this process. The g;; correspond to transitions involving 
only one event. Thus, if m, n are the lengths of the two queues, there is a transi- 
tion of rate 2p corresponding to an arrival, which increases the smaller of m, n 
by 1. There is also a transition of rate 1 which decreases m by 1, and another 
which decreases n by 1. If we restrict 7m, to be symmetric, we have to satisfy 
the inequalities 


20(Ymn — Ymnst) + (Ymn — Ym-t.n) + (Yun — Ym ni) (1 — bn) 2 1 (m2n) 
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for all but finitely many (m, n). It is easily seen that ymn = m’ + n’ satisfies 
these inequalities for sufficiently large m, so long as p < 1. Hence we obtain 
TuHroreM 1. There exists a unique limiting distribution {pmn} of the lengths of 
the two queues so long as p < 1. 
In all the analysis that follows, we shall confine attention to the case p < 1, 
and to the stationary distribution {pmn}. 


3. The equilibrium equations. These are derived from the Kolmogorov for- 
ward equations in exactly the same way as in Haight’s paper [3], and we shall 
not, therefore, go into the details. We note that, by symmetry, 


(2) Pan = Pan - 


With this simplification, the equations become, for all m 2 n, 


{ 2p(m = n = 9)) 
lo +2 | 


( (m, 40) 
(3) oie a 


\ 


(2p(m n) 


= p(m =n+ 1) PDm—1,n + 2PPm.n—1 + Pm .n+1 + Pm+i,n + 
O(m = n + 2) 


Now define 
(4) FAxr) = > Pnicak” 
n=) 
Then equations (3) reduce to 
‘x(Qor + 1)Fi(x) — (1 + p)xFo(x) = — Poort 
(5) 4a(2pr + 1)F2(x) — 2(1 + p)xFi(x) + (1 + pr) Fo(z) = poo — prot 
\a( (2px + 1)Fr4i(2) — 2(1 + p)aF,(x) + Fru(x) = priv — prot 
(r = 2,3, 
LEMMA 2. 


(6) (z,‘y) = > F,( (x)y" 


r==() 


exists in |x| S 1, |y| < 1 + 2p 


Proor. Put « = 1 in (5) and add the first r equations. 
(1 + 2p)F,4:(1) — F(1) = —po S 0 (r 2 1) 


so that F,(1) < (1 + 2p)*"F,(1). Hence |F,(x)| < F,(1) S Fy(1)(1 + 2p)*”, 
and the lemma follows. 
In |x| S 1, |y| < 1 + 2p, the equations (5) may be combined to give 


a(2px + 1){[F(a, y) — F(a, 0)|/y} — 2(1 + p)xF (a2, y) + (1 + p)xF (2, 0) 
+ yF(x, y) + pryF(x,0) = (y — x)F(O~7 y), 
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or 
{a(2pr + 1) — 2(1 + p)xy + y} F(a, y) 
= y(y — x)F(O, y) + {a(Qpr + 1) — (1 + p)zy — pry} F(z, 0). 
It follows that, whenever z and y satisfy |x| < 1, |y| < 1 + 2p, and 


(8) a(2px + 1) — 2(1 + p)ry + y == @, 


then y(y — x)F(0,y) = —{a(2px + 1) — (1 + p)zy — pry} F(c,0), which 
may be reduced to 


(7) 


(9) yly — x)F(O,y) = —x(2px + 1){1 + px — (1 + p)yf F(z, 0). 


4. The fundamental correspondence. We may define a symmetric (2 — 2) 
correspondence S as follows: 

Derinition. Y = Sy if, and only if, there exists an x such that the pairs 
(x, y), (x, Y) both satisfy (8). Then y + Y = 2(1 + p)z, yY = x(2pxr + 1), 
and, eliminating z, we obtain 
(10) p¥*— A(1+p+oe)y— (1+ o)j¥ + y(1 ++ py) =0 


* . > “ao 
For a given yo, we define an “S-sequence” 
tt 5 Y-2,Y-1, Yo, Yi, Y2,°** 


such that yni: = Syn, Yn = Syn. 
Lemma 3. Any S-sequence | y,} is of the form 


(11) yn = A+ p(ar" + ad") 
where 

(12) A = (1+ p)/21 +), 
\ is the real, positive root, less than unity, of 

(13) A+2r7* = 21+ p+ 0)/p, 
(14) p= 24/(1 + 9°), 


and a is an arbitrary complex number. 
Proor. From (10) 


p(Ynsr + Yna) = AL +pt+p)yr — (14+ p). 
Now 2pA = 2(1+ p+ )A — (1+ 1), so that 
(Yn+1 Sti A) 7 [(1 + p + o )/pl( Yn — A) + (Yn _— A) = 0. 


Hence y, = A + Br" + Cr™ for some B, C. However, since y,; = Syo, we 
may put 


y=A+B+0C, Y=A+Bv4+C\"" 
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in (10), which yields an equation simplifying to 
BC = 4. 
Writing B = ya, C = ya proves the lemma. 


Two other results which will be used later are: 
(i) Since 


+742 e242 tO TO 

p 
2(1 + p)*/p = A’/p, 
+r? = A/p 


(15) 


(ii) If |y} S 1 + p, |Y| S 1 + p, the corresponding value of x given by z = 
(y + Y)/2(1 + p) satisfies 
a< ly| + |¥| < 
'" 21+ )) — 

5. The univariate generating function /'(0, y). Suppose that Y = Sy, and that 
\y|, |Y| S 1 + p. Then the corresponding x has |x| S 1, and we may eliminate 


F(a, 0) from (9) to give 
YF, Y) __l+er—-(i+oe)¥y—2 _1l+exr—(1+p)¥ (1+ 2p)e—Y 
yF (0, y) 1+pr—(l+p)yy Y-—x 1+pr—(1+p)y (L+2p)r-y 
— (1+ er) (1 + 2p)e — (1 + p) (1 + 2p)e¥ — (1 + ex) ¥ + (1 +p) ¥* 
(1 + px)(1 + 2p)x — (1 + p)(1 + 2p)ay — (1+ px)y+ (1+ p)y’ 
sa px(1 — 2x) mht — ee  Y—pe _ (2+ 0)Y — py 
px(1 — x) — (1 — a)y 


According to Lemma 3, we may write 


ja 











y= or ~ @ Fey =e 





y=At+u(z+z’), 
Y=A+yu(z2+2d 2°), 


(16) 


and we may define 
(17) g(z) = yF(0, y). 
Then 
g(dz) _ (2+p)¥ —py _2A/u+ {(2+p)d—pjzt {(2+p)d —pjz- 


giz) (2+p)y—pY 2A/ut+{2+p—pdjzt+ (2+p— pr }z7 
Equation (15) implies that z + i is a factor of both numerator and denomi- 
nator, so that 


g(kz)__ y—Nz 
(18) g(z) = Myz — 1’ 
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where 
2+ p—pr 
(19) : = —- > i, 
+” -ar p) 


Equation (18) is valid in |A + u(z + 2")|, |A + w(Ae + AN72")| S14 op. 
Now, if \’ < |z| < X*, then 





\Atp(et27)|SA+ple| tule A+ 4+27)= 24 = 5 <ite. 
p' 


Hence (18) is valid in 
rb — & < |Az < lel < NX? + &, for some 6; , d2 > 0, 
and hence in 
At -~s< lz| < yt 4 6, for some 6 > 0. 
g(z) is regular in this annulus, and may therefore be expanded in a Laurent 
series 


(20) g(z) = >) ane” 


Hence (A’yz — 1) D> a,d"2” = (y — Nz) & ane”, 


an(A" + vy) = Gna(v™ 'y + 1), 


~iyn yr itn 'y 
21 Qn = Aon = ao(d*y")” ——— 
(21) 0 Ty Diam 
This defines g(z) uniquely except for a multiplying factor. Since a, ~ C(A*y7")'"! 
as |n| — ©, g(z) is regular in My < z| < ty Equation (18) may be writ- 
ten as 


— rz 
g(az) = + 


yz ao 1 g(z), 


which may be regarded as defining a function regular in \'y" < lz| < Aby, 
except for a pole at z = xy, and coinciding with g(z) in xt “in < rty. 
Hence g(z) can be continued into 


My" < lz < i 


except for a pole at z = \'y~*. Repeating this procedure, we can continue g(z) 


over the whole unit disc, excluding z = 0, as a regular function except for poles 
at 


n+4_—1 
g=)*”, (n = 0, 1, 2, ---). 
This proves 
THEOREM 2. F(0, y) can be continued to a meromorphic function over the whole 





1320 J. F. C. KINGMAN 


y-plane. Its poles are at the points Y, ,n = 0, 1, 2, --+ , where 
22) Y¥,=A+ party? +x" 4y) 


It is an easy matter to show that Y,, takes its smallest value at Yo = (2 + p)/p’. 
Then, from the fact that, for some C, F(0, y) — [C/(y — Yo)] is regular in 
ly| < Y,, we obtain the 

COROLLARY t 


(23) Pon ™ Cle” (2+ p)|" as n> o~., 


Let the residue of g(z) at X” 4 be gn. Then g(X" Hy 4 ¢) = gn/f + O(1). 
From (18) 


g(a" thy D te ry) _Y ad yt, 


1 
O(¢). 
hI) wp TOW 


9 


n+ Ja 1 P nN thy 2 


AGn 4-3 * 
so that 
. n 1 ioe hy 2 
9 a aa \n 
(24) Gn = go(— dy) ‘th 
From this it is easy to see that the residue at y = Y, of F(0, y) is ¢, , where 


2n+1 


iA F 


~) Y r Yo -1 \n - 1 — Ny? 
25) pontetouin: age ( <> Ge) es 
(25) do i-n Y. ¥ I iy 


Lemma 4. Let C,, be the contour in the y-plane corresponding to \z| = "**. Then 
supc, |y F(0, y)| > 0, asn— ©, 


Proor. Since y ~ u/z as z — 0 it is sufficient to prove that 


n+} i0\2 n+} 0 


Gn = supe |(A"’e” )'g(A"’e™ )| 0. 
From (18), 


1 io Ny 4° 4 l “ Ny 


d*y sup | - =X ——., 
* " 1 — Aye” - A"y 


7" 


— 0, 
NY 


ee 
G, < Go(x?y)" TI] a 
j=1 si 


But, if \”*y = 1, then 
N(2+p—pA)2>p—(2+p)d4. (2+ p)M(1 +A) z= 
[((2 + p)/p] >A +A*—1 = 2A(1 + p+ p)/p] — 1, 


The contradiction establishes the lemma. 
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THEOREM 3. 


F(0, y) = F(0,0) +4 es 
Y & Yiy — Y,) 

Proor. y F(0, y) is meromorphic with poles at y = 0 (with residue F(0, 0) ) 
and at y = Y, (with residue ¢,Y;'). By virtue of Lemma 4 we may apply 
Cauchy’s partial fraction theorem to give (26). 

COROLLARY 

Po = — 7 o,/ ee 
r=(0 
Putting m = n = 0 tn (3) gives 
(28) Po = p ‘p10 . 

6. The univariate generating function F(z, 0). Equation (9) gives 
(29) 2wx(2pr + 1)F(2,0) = —y(y — x)F(O, y)/{1 + px — (1 + p)y} 
where (x, y) satisfy (8). Hence F(z, 0) is an analytic function except when 
y = Y,, or when 1 + px = (1 + p)y. This last equation is satisfied only when 
x = O oraz = 1/p. Now define X,, as the value of x such that (X,, Yn-1) and 
(X,, Yn) both satisfy (8). Then Xo = 1/p’, and it follows that the poles of 
F(x, 0) are exactly at 
(Yuna + Yn)/2(1 + p) = A(2 4+ A"yV + A "y)/2(1 + p), 

(n = 0,1, ---). 

Now (29) enables us to find the residue y, of F(2,0) at x = X, , namely 

(31) y Bn An'(X, a Y,)(1 7" Ny") gn a 

"211 + p)Xn (2pXn + 1){1 + pX, — (1 + Yn} 

It is also clear that the supremum of F(x, 0) on the contour in the z-plane cor- 
responding to |z} = A""* tends to zero as n — ©. Hence, as in Theorem 3, we 
have 

THEOREM 4. 


(30) _ 


(32) F(z,0) = >> = 


r= LT — ae 


(33) Pan = — 2 een 


r= 
‘ 2n 
(34) ~Cp 
asn— &, for some C. 


7. The bivariate generating function F(x, y). By equation (7), F(x, y) has 
singularities only on the planes x = X, and y = Y,. We may therefore prove 
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THEOREM 5. 
(35) Pmn ~ Clp"/(2 + p)™”"]. 
asm,n— © inm = n, for some C. 
ProorF. Since the nearest singularities to the origin are at x = Xo, andy = Yo, 
Duara ~ CXe' Yo. 
Putting Xo = 1/p, Yo= (2+ p)/p proves the result. 
As in the two previous sections, we could make a detailed investigation of the 
properties of F(x, y). However, much of the interest in a queueing system lies 


with the waiting time distribution, and it will be shown in the next section that 
this may be determined simply from a knowledge of F(z, 0). 


8. The waiting time distribution. The waiting time of a customer depends on 
the length of the queue he joins, i.e., on 


(36) l = min (m, n). 


Now E(z') = Dotoz'{pn + 2d 2-0 Pin} = 2F(z, z) — F(z, 0). In (7) put 
z= y = z. Then 
2(i — z)F(z,2) = 2(1 — z)(1 + pz) F(z, 0), 

so that F(z,z) = (1 + pz)F(z, 0), and E(z') = (1 + 2pz) F(z, 0). Hence the 
distribution {p,} of 1 is given by 
(37) Pi = Pu + 2ppi-ni, 
and is determined from Theorem 4. 

The distribution of waiting time is then made wp of a component of zero 


waiting time with probability po , together with an absolutely continuous com- 
ponent with density 


yl—-1 —W 
=. We 


3 (W) = —— 
(38) SW) = 2m Tap) 

9. The one-pole approximation. It follows from (22) that, for n large, 
Y,~ uyn?, and from (26) that ¢, ~ C(—Xd“y)’Y;". Hence the rth term in 


the series (27) for pao is of order 


cre 
(n'y) asr— a, 


For all p,\ +” = 6, and hence A S 3 — 24/2 ~ 0.17. Hence, for n fairly 
large, \"*"y will be very small, and we can safely neglect all but a few terms of 
the series. Even for n = 1 (when \’y decreases from 1 to 0.17 as p increases from 
0 to 1) this will be valid so long as p is not too small. 

Hence, in fairly heavy traffic, we may obtain a reasonable approximation by 
taking only the first term of the series for pao . Similar remarks hold for the other 
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series, so that 


2m 
p 


bh 
2+ p 


for some C. Equation (37) then shows that 


Dm n Sat C 
(39) 


Po = C 


(40) pic C(1 + 2p)p" (l> 0). 
Thus we are led to 


THEoreM 6. In heavy traffic the distribution of waiting time is approximately 
the same as for a single queue with traffic intensity p’. 


10. Related problems. Haight [3] also considered the case in which a customer 
is permitted to change queues if by so doing he could improve his position. Under 
the symmetry conditions that have been imposed in this paper, this process is 
equivalent, from the point of view of the total number queueing, to a single queue 
with two servers. The determination of the waiting time distribution is, how- 
ever, no longer a simple matter, since the order in either queue is not necessarily 
the order of arrival. 

The problem considered in this paper is an example of a random walk on posi- 
tive integer pairs, with rather complicated boundary conditions. The method of 
attack used may be generalized to deal with other problems of this sort, and it 
is hoped to publish an account of this work elsewhere. 

This same method, together with the use of the Laplace transform, may also 


be used to study the transient behavior of the double queue and of other random 
walks. 


11. Acknowledgments. I am indebted to Professor D. V. Lindley and Dr. P. 
Whittle for their helpful suggestions, and to the Department of Scientific and 
Industrial Research for a research studentship. 


REFERENCES 
[1] WiiL1aAM Feuer, ‘‘On boundary conditions for the Kolmogorov differential equations,”’ 
Ann. Math., Vol. 65 (1957), pp. 527-570. 
[2] F. G. Foster, ‘‘On stochastic matrices associated with various queueing processes,”’ 
Ann. Math. Stat., Vol. 24 (1953), pp. 355-360. 
[3] F. A. Harent, “Two queues in parailei,’’ Biometrika, Vol. 45 (1958), pp. 401-410. 





QUEUES WITH BATCH DEPARTURES I 


By F. G. Foster anp K. M. Nyunt 
London School of Economics 


1. Introduction. This paper has a pattern closely similar to that of [4]. The 
following single-server queueing system is considered. 

(i) Units arrive at the sequence of instants 7, , r2, --- , such that the inter- 
arrival times, 0, = Tnx: — tn > O (nm = 1, 2, ---), are identically distributed 
independent random variables with an exponential distribution function, 


F(x) = Pio, S$ zt] = 1 —e™ (2 = 0). 


Put a = fo edF(x). ThendA = I/a. 

(ii) Units are served in batches of exactly k units by a single server, in order 
of arrival. Denote by x, the service time of the nth batch to be served. We suppose 
that {xn} (nm = 1, 2,---) is a sequence of identically distributed independent 
positive random variables, independent also of the sequence {7,}, with common 
distribution function, H(z) = Pix, S aj. Put y(s) = fee” dH(z), 
8 = fo x dH(z) and u = 1/8. Define p = X/u. 

In the terminology of Foster [3], this system can alternatively be described as 
having the 1-input (arrivals) untriggered with input quantity constantly unity, 
and an exponential distribution for the 1-input time. The 0-input (departures) 
is triggered with input quantity constantly, k and a general distribution for the 
0-input time. The system has infinite capacity. For definitions of these concepts, 
the reader is referred to [3]. 

Such a batch-size model does not appear to have been treated explicitly in the 
literature, although it has obvious applications. A simple special case of it is, 
however, implicit in the work of Jackson and Nichols [5]. These authors suppose 
that an inter-arrival time devoted to one unit is composed of k consecutive 
phases, each exponentially distributed. If instead, we think of this unit as com- 
posed of k subunits (corresponding to the phases of arrival) then we have the 
idea of batch service: Jackson and Nichols treat the special case of exponential 
service times. 

Justification for the explicit consideration of batch departure systems resides 
in the fact that the results one can obtain are elegant, and form a natural generali- 
zation of the case of unit departures, as treated, for example, in Kendall [6]. 
The analysis in this paper is similar to that in Bailey [1], but the model is in fact 
different, and the results obtained here are new. In the terminology of Foster 
[3], the model Bailey considered differs from the present one in that the 0-input 
in Bailey’s model is untriggered with controlled input quantity of zero to k units, 
. depending upon the state of the system: the input being, for example, virtual 
when the system contains no 1’s. In other words, service begins from time to time 
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whether or not there happen to be any units in the system. Bailey obtains for 
this system the equilibrium distribution of queue-size at instants just before 
service is due to begin. 

Denote by &(t), the number of units in the system, including the batch under 
service, at the instant t. Let co, be the instant at which the nth batch (of size k) 
departs from the system on receiving service. Put —, = &(o, + 0),n = 1,2, ---. 
We shall determine the probability generating function (p.g.f.) of the limiting 
distribution, 


Pj = limy..« Plé, = ji. 


The distribution {p;} exists and is independent of the initial state of the system, 
if, and only if, p/k < 1. The proof of this statement follows the same lines as in 
the case, k = 1, as given in Foster [2] 

Let us denote this batch departures model by E,/G*/1 where E;, indicates an 
exponentially distributed inter-arrival time, and G* indicates that the service 
time has a general distribution, and that service is in batches of k units. We shall 
consider its relation to the unit departures model, E,/G/1, where E, indicates an 
Erlang distribution with parameter k. We shall derive the equilibrium dis- 
tribution of queue-size at instants just after departures for this latter system in 
terms of {p{}. As a special case we shall consider the system, E,/E,/1. In our 
previous paper [4] we obtained the equilibrium distribution at instants just before 
arrivals for the same system. In this paper we shall establish the identity of the 
two formulae, thus verifying a special case of the general proposition that, for the 
system G/G/1, when these exist, the equilibrium distribution of queue-size at 
instants just after departures is identical with that at instants just before arrivals 
(cf. Khintchine [7]}). 


2. The system E,/G"/1. Let {v,}(n = 1, 2, ---) be a sequence of identically 
distributed independent random variables with distribution, 


k; = Pi», = Jl, j = 0,1, 2,--- 


where 


git oh o e*(rx)? 
k; -[ NMS! aH(2). 


_! 


Then », is thought of as the number of units joining the queue during the service- 
time of the nth batch. 

Put «(z) = >o%0k,z’. We note that «(z) = y{A(1 — z)}. We assume that 
p/k < 1, and also that «(z) is regular within the circle |z| = 1 + 5, where 6 is 
some small positive number. This implies a slight restriction on the distribution, 
H (2x), which will always be satisfied in practice. It follows from Rouché’s theorem 
that the equation, 


(1) «(z) = 2", 
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has exactly k roots inside or on the unit circle. For «'(1) = p < k, so that for 
some small positive 6, «(1 + 6) < (1+ 5). Therefore, on the circle, \z) = 1 + 4, 
Ix(z)| = >> kylzl? < (1 + 6)* = |z"|. Clearly, z = 1 is one root, and it is a simple 
root. Denote the other k — 1 roots by 6; , 62, +--+ , iu. 

Define P*(z) = >> So pz’. 


THEOREM 1. 


k—1 
(2) (k — p)(z — 1) [J (2 — 4) /(1 — 4) 
- P*(z) wie j=1 a : 





z*/x(z) — 1 
Proor. The process, {£,}, is a Markov process with transition matrix described 
by the relations: 
€n41 = max [&, — k, 0] + »,, 


The random variable, max [£, — k, 0], has, in the limit asn — ~, the generat- 
ing function, 


k—-1 
P*(z)e* — > pt(z** — 1). 


j=0 


Therefore, since vp, ¢ ax [£, — k, 0] are independe ye have the relation, 
Therefore, since v, and max [é k, 0] are independent, we have the relation 


k—1 
Pt(z) = {Pt(z)e* — > pt(z** — 1)}«(z), 


j=0 


which, on simplifying, reduces to 


kod : 
D pi (2! — 2) 
P*(z) = I a ie ‘ 
. z*/x(z) — 1 
Since P*(z) is a probability generating function, it is absolutely convergent 
in the region, |z| S 1. Therefore, the roots of the numerator in this region must 
coincide with those of the denominator, and the latter are 


1,6, , 62 ,°°° , Opn. 


Therefore, since the numerator is a polynomial of degree k, we have 


k—-1 k—1 
> pi(z* — 2’) = C(z — 1)J] (2 — 85), 


j=0 j=l 


where C is a constant to be determined. Using P”™(1) = 1, we find that 
k—1 
C = (k—p) / JI (i — 4) 
j=l 


and (2) follows. 
If (1) has any roots outside the unit circle, we shall denote them by 


Bi, Bo, °°: 


and we define e; = 1/8;. 
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EXAMPLE 1. If the service-time distribution is Erlang, Z, , then 
w(z) = {1 + [o(1 — z)/r}}, 
and the denominator of (2) becomes 
2"{1 + [p(1 — 2)/r}}" — 
which, being a polynomial of degree k +- r has precisely the (k + r) zeros, 
1, 6, , do, eee » 5x-1, Br, Be, eee » Bes 
and so can be expressed as 
C(z — WIT @ - a) IT @ - 


where C is a constant to be determined. Therefore, substituting in (2) and 
normalizing, we obtain 


‘| Pe) = 1] (1=*) = (1=4), 


where the e;’s are the reciprocals of the roots outside the unit circle of 
the equation, 


(4) {1 + [o(1 — z)/r]}" =< 


We can show that these roots, and hence the e;’s are distinct. For suppose 
on the contrary, that (4) has a double root, say, a. Then for z = a, we should 
have, by differentiation of (4). 


(5) p{l + [o(1 — z)/r}}* = keh. 
Dividing (4) by (5) and simplifying, we obtain 

a = (k/p)[(r + p)/(r + k)). 
But this value must now satisfy (4); that is 
(6) (r+ p)/(r + kj" = p/k. 


Now we are assuming that the traffic intensity, p/k, is less than unity, say p/k = 
1 — 5, where 0 < 6 < 1. Substituting in (6) and putting b = k/(r + k), we get 
after simplifying, 


1 — bs = (1 — 5)’. 


But this is impossible, unless 6 = 0. Therefore, (4) has no multiple roots, and 
so the «,’s are distinct. 
It follows that we can write 


(7) Pt(e) = Y0,/(1 — ep) 
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where 


(8) C;= (1 — «)[] (i — «)/(1 — éi/e;). 
ty] 

Formula (3) gives the p.g.f. of the equilibrium distribution after departures 
for the system, E,/E*/1. The traffic intensity is p/k. We note here for later use 
that if the traffic intensity is changed to rp/k, the ¢;’s become the reciprocals of 
the distinct roots outside the unit circle of the equation, 

(9) (1+ pil —z)J” = 2. 


EXAmpLe 2. If the service-time distribution is exponential, E, , we have x(z) = 
{1 + [p(1 — z)]}~’, and (3) becomes 


(10) P*(z) = (1 — «)/(1 — e) 


where « is the reciprocal of the reot outside the unit circle of 
(11) {1 + [o(1 — z)}}p* = 2. 


The formula (10) is for the system E,/E}/1, with traffic intensity, p/k. 


3. Relationship with the unit departures system E;,/G/1. If & is the number of 
units left by an arbitrary departing batch, then the number of complete batches 
of size k in the system at this instant will be 


¢ = [é/kl, 


where [x] denotes the greatest integer not greater than z. 

We now interpret the random variable ¢ as the number of units left by an 
arbitrary departing unit in the unit departures system E,/G/1, which has mean 
service time 1/u and Erlang inter-arrival time distribution, F; , with a mean of 
k/d. The traffic intensity is thus p/k. 

We consider the distribution of ¢. Define 


qj = Ply = jj), 


and put 


Now define 
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and 


(13) > Ot = Te) 


j=0 


We have 
Qo = Pin 
Qi = Pha 
and generally 
QF - PG +e 


Therefore, 


oe PG (j+I)k— 12 . 


j=0 


| P*(v) dv 

— 2mi Je (1 — vy’ 

where C is a contour around the origin excluding the poles of P™(z)/(1 — 2z). 
Therefore, 


But from (12) we have 


Q* (z) — 2 P*(v) dv 


L—2z joo2mwi de (1 — v)yte’ 


so that 


+ 3= P*(v) dv 
- C®@ - 2 Laisa =e: 


The poles of the integrand within C are at 


he . 7 . l/k - 
where w’ is a kth root of unity. The residue at v = w’z’” is 


1/zkP* (w’z and 1 — wo. z*), 


Therefore, summing the residues, we obtain the alternative formula, 


pt. ilk) lk 
(15) gt(e) Lazy (wz )w'z 


zk fA) 1 — will 


Examp.e 3. If the service-time distribution is Erlang, E,, then P*(z) is 
given by (7), and so from (14), 


> aia C; (1 os 2) dv 
(16) Q"(z) = > ail, (1 — eg v)(1 — v)v*(1 — oz) * 


This is the p.g.f. of the equilibrium distribution of queue-size after departures 
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in the system F,/EF,/1. For traffic intensity rp/k, the ¢;’s are the reciprocals of 
the roots outside the unit circle of Equation (9). 
Now we have 


—_—— Sok. | 
2mi Jo (1 — ev) (1 — v)v*(1 — v-*z) 
l [ C; 1— es.) 
ith: sete Ih ipueanearetnmmceng EG ti stances Ae 
2m Je (1 — gv) (1 — 5 ( 1 — vz - 


1 C; i-_g” 
- | dlr, 
2mi Jo (1 — g v)(v — 1) 1 — vz 


since the neglected part of the integral has no poles inside C. The integral (17) 
is most easily evaluated by considering the single pole of the integrand outside 
C atv = 1/e;. The residue is 


—C,/(1 — «)((1 — &)/(1 — e)}. 


Since the integrand is rational with denominator of degree at least 2 higher than 
that of the numerator, it follows that (17) is equal to minus the residue outside 
C. Therefore, we have 


(18) gt(e) = HO — 4) — &)/ (1 — gz). 
j=l 
Examp_e 4. If the service-time distribution is exponential, Z, , we have 
from (14) 


Q(z) o% 


zZ Ls +8 dv 


2m Jol — wv (1 — vi — — yz) 
=(1- aya — éz) 


by consideration of the pole at v = 1/e outside C. 1/e is the single root outside 
the unit circle of equation (11). This is the p.g.f. of the equilibrium distribution 
of queue-size after departures in the system E,/E,/1 when the traffic intensity 
is p/k. The formula is, however, found to be identical (apart from notation) 
with that obtained by Jackson and Nichols [5] for the distribution before arrivals 
in the same system. A more general case of this observation is considered in the 
next section. 


4. Relationship between the queue-size distributions at departure and at 
arrival points for the system /,./H,/1. In our previous paper [4] we obtained the 
p.g.f., Q(z), for the equilibrium distribution of queue size before arrivals for the 
system E,./E,/1 


i-—#z P(v) dv 
19 ¢ (z = — | ane i —, 
Ame M2) 2mi Jo v(1 — v)(1 — v2) 








QUEUES WITH BATCH DEPARTURES I 1331 


C is a contour around the origin excluding the poles of P(v)/(1 — v), and 


Tr l—-y 
P(v) = midtinaoeassaetel 
w) = 1 (7=%), 


where for traffic intensity rp/k, the y,’s are the roots inside the unit circle of 
(20) [o/(p +1 —2)' =e’. 


We shall now establish that Q(z) is identical with Q*(z) as given by formula 
(18) above. We first examine the relationship between the roots, 1/e; , of (9) 
and the roots, 7; , of (20). 


Now if 1/e; is a root of Equation (9), then 
(21) 6 = [1 + (1 — 1/e,)J’. 
Let us define 
(22) 1 + p(l — 1/¢;). 
[t follows that 
(23) = p/(p + 1 — 93) 
and, from (21), 
(24) 
Now from (23) and (24), 
v5 = [e/(o +1 — ys)J. 


But this shows that y; is a root of Equation (20), and moreover, from (24), 
ly;| < 1. 

The relation (22) thus establishes a one-to-one correspondence between the 
r roots, 1/e; , of (9) outside the unit circle and the r roots, 7; , of (20) inside the 
unit circle. Since we have proved that the roots, 1/¢; , are simple, it follows that 
the poles of the integrand in (19) outside C are also simple. 

From (22), we have 

p(1/e; — 1) me 5 


and 
p(1/e; — 1/e;) = 
Therefore, 


(1 — «)/(1 — e/e;) = (1 — ¥8)/ (3 — ¥4)- 
Now define 


D;=(1- vw TT (1 — ¥:)/(1 — vi/75). 
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It follows that 

- Y / \ 1—r / ) 
(25) C;/(1 — ¢;) = ¥5 Dj/(1 — 45). 
Analogously to formulae (16) and (17), we can now write 


— | D;(1 — z) dv 
») Q( = c (1 — yw )(1 — v)oQ — 0772) 


~ — J / D; " 
(27) a: 
, sai 2mt Jo (1 — y; v)(v — 1) 


Each integrand in (27) has a single pole outside C at v = 1/y; , and the residue is 
—y75 Dj/(l — ¥5)( — ¥5)/( — yz) 


which, by using (24) and (25), we can transform to 
Y k k 
—C;/(1 — «;)(1 — 6&)/(1 — @ez). 
Since in (27) ts.. sum of the integrands has its denominator of degree higher by 
< . . k j 
2 than that of the numerator, it follows that Q(z) = 505.1 C;/(1 —e)(1 — 4), 
k . . ’ . . 
(1 — ¢€jz), which is formula (18) above, and we have proved that, in the system 
E,./E,/1, the equilibrium distribution of queue size at instants just before arrivals 
is identical with that at instants just after departures. 
5. Further work. Let 7:, 72, --- denote the sequence of instants at which 


units join the batch departures system. In a sequel we shall consider the existence 
° a ili . . ° ok . e 
of the limiting distributions, {p;} and {p,}, defined, respectively, by 


p; = limps Plé(t) = J] 


and 
pi = limn+e PlE(t2 — 0) = Jj). 


We shall also examine the relationships existing between the three distributions 


f t 


{pij, {ps} and {pf}. 
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NOTES 


ON THE CHAPMAN-KOLMOGOROV EQUATION! 
By Jack KarusH 
University of California, Berkeley 


A partial answer is given to the question of whether every Markov random 
function comes from a system of transition probabilities satisfying the Chapman- 
Kolmogorov equation. A given Markov random function determines the transi- 
tion probabilities up to sets of probability zero and for any choice of the transi- 
tion probabilities the Chapman-Kolmogorov equation holds up to sets of 
probability zero. The problem then is one of selecting appropriate versions of 
the transition probabilities so that the Chapman-Kolmogorov equation hoids 
everywhere. It is shown that such selections exist whenever the time parameter 
set is countable or whenever the joint distribution of any two of the random 
variables is absolutely continuous with respect to the product of the marginal dis- 
tributions. Although the latter condition is always satisfied when the state 
space is countable, or more generally, when each random variable assumes a 
countable number of values with probability one, this case, being especially 
simple, is treated separately. The results are based on exploiting the device of 
using the marginal distribution when in doubt about what the conditional prob- 
ability distribution should be. 

Let (X,,¢e¢ 7) bea Markov random function, where T is a set of real numbers 
with elements denoted by r, s, t, u, v. Let $ be the o-field of linear Borel sets, 
and for every t define P,(S) = P[X, eS], S ¢8. For every s, t, s < t, consider 
the joint probability distribution of X, , X,. There exists what we shall call a 
version of the conditional probability distribution of X, given X, or, more 
concisely, a version of P(X, | X,), that is, a function P,,; of 2, S, x real, S « 8, 
such that P,.(-, S) is Borel for every S ¢ 8, P,:(x, -) is a probability distribu- 
tion on § for every 2, and 


P,(dx)P,.(z, S') = P(X, ¢ 8S, X,¢ 8’), S,S' eS. 
8 


The Markov property implies that for r < s < t, P,.*P,. is a version 
of P(X, | X,), where by definition 


(P,.*#P,.)(z, 8) = [P.Cz, dy) P..(y, 8), allz,Se8, 
Received September 26, 1960; revised March 27, 1961 
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so that the Chapman-Kolmogorov (C — K) equation 
Py(z, S) = (PrtPst) (2, S) 


holds for x zg N ¢ 8, S eS, where P,(N) = 0 and N depends on r, s, t, S. 

On the other hand the usual approach ([{1] pp. 89, 255-6) is to start out with 
(Py, 8,teT,s < t) satisfying the C — K equation identically, together with 
an arbitrary initial probability distribution P,,, 7 being assumed to have a 
minimum value tf, and to construct the probability distribution of the cor- 
responding random functions. A natural question is whether the probability 
distributions of all Markov random functions wa 7 having a minimum value 
are obtained in this manner, or slightly more generally, whether, or under what 
conditions, one may select versions P,, of P(X, | X,),s < t, satisfying the C — K 
equation identically. 

Each of the conditions 1-4 below ensures such a selection; 1 and 2 are special 
cases of 4 and 3, respectively, but are isolated because of their simplicity. 


1. T = integers. In this case an obvious selection is available. For every n 
take any version of P(X,4; | X,) and define for m > 0, all n, 


Poise ® S) _ [Panss(s, dy; ) [Pare o( Yi ; dy) - jane 


Oe sain dy m- ot atte waa! Ym-1 > S), all x, S é S. 


It is easily verified that P,.,4m is a version of P(X,» \|X,) and the C — K 
equation is satisfied identically. This amounts to verifying that the operation 


“ 


x’ is associative. 
2. For every t, P, is discrete, that is, there exists a countable set C, such that 
P<(C,) = 1. For every s, if P,({x}) > 0 then necessarily 
P(X, = z,X,e€ S] 

> i ‘ ? J a 

P,.A2, 8S) = PIX. = a) t>s,Ses, 
and if P,({z}) = 0 define 

P,A2z, 8) = PS), s, SeS. 

Since P,{z: P.({z}) = 0] = 0, P.: is a version of P(X,| X,). lf r < 8 < ft 
and P,({z}) > 0 then P,.(z, S) = (Pre*P,.:)(2, S), S eS. If P,({z}) = 0 then 


ie 


P..(2,8) = PAS) = [ P.(dy) P.sCy, 8) -- [P..(z, dy) P.(y, S) 


= (P,.*P,,) (2, S), S é $s. 


3. For every s, t, s < t, there exists a version P,, such that P,,(x, -) is abso- 
lutely continuous with respect to P,(P,,(x, -) « P,) for all x, or equivalently, 
the joint probability distribution of X, , X, «< the product measure P, x P,, 
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or equivalently, < some product measure \ x uy, X, uw o-finite. We first establish 
the equivalences. Assume P,,(2, -) < P; for all and suppose (P, X P,)(B) = 0, 
where B is a two-dimensional Borel set; then there exists N ¢ 8 such that P,(N) = 
0 and P,(B,) = 0 for x zg N, where B, = [y: (2, y) € B), and we have 


PI(X,, X,) eB] = [P.caz) Pula, B.) =0. 


Conversely, if the joint probability distribution of X,, X;<«< X uy, A, wo-finite, 
then P, < A and P; <« uw, so that there exist densities dP,/dd, dP./dy. 
Let S = [x: (dP,/dd)(x) > 0), S’ = [x: (dP./du)(x) > 0). Then P,(S) = 1, 
PAS’) =1,\< P, on S, andu <« P;on 8’,sothat\ X u< P, X Pion S X S’ 
and P[(X,, X:) eS X S’| = 0. It follows that the joint probability distribution 
of X,,X.< P, X P, and therefore has a density which can be taken to be of the 
form p.(x)ps(x, y) where 


[pute, y)P (dy) = | for all x. 
Then 


/ Pei(x, y) P (dy), all z, S «8, 
“8 


defines a version of P(X, | X,) and P,,(x, -) « P, for all z. 
Let U be the union of a countable dense subset of 7 and the countable set of 
points of 7 which are not two-sided limit points of 7. For every ¢ let 


N.= UU {e: Pw(z, Sy) ¥ (Pu*Pue)(z, S,)], 
t<u<v 
u,veU 


y rational 
where S, = (—~, y), and define, fort < ue U, 
P(x, -) = Pu(2, -) ifxeN, 
= Pp, weeNn,. 


Since P,(N,) = 0, P, is a version of P(X, | X,) and since a probability distri- 
bution on $ is determined by its values on S, , y rational, we have, for x zN; , 


oe ee ee t<u<vu,vew, 
and hence 
P *P yr) (x, -), t<u<v,u,veU. 
Forze WN, 
= (Py *Py.) (2, 


by the same reasoning used in 2. Therefore 


Pio = PintPac , t<u<ov, uve U. 
> 
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Now P;,,.(2, -) < P. for every x; consequently Pula, -)-« P, for every «x. It 

follows that P,,.*#P.. is independent of the version P,, of P(X,| X,) for any 
/ ° y > 

s > u. Let P,,, be another version; then for every S ¢ §, 


Ply: Pus(y, S) ¥ Pis(y, S)] = 0 


so that for every 2, Pxu(2, [y: Pus(y, S) ¥ Pus(y, S)]) = 0 and hence 


| Puta, dy) Pus(y, S) = [ Putz, dy) Pus(y, S) 


r; a / . 

or PutP us = PutPus. In particular 

Pu = PusPu = PueP., ti<u <v,4u,v eU- 
If s < t g U there exists a u ¢ U such that s < u < ¢t and we define 
P., = P..*P... Then P,, isa version of P(X, | X,), is independent of the version 
of P(X, | X,,) selected, and is well defined, for ifs << u < v < t,u,v e U, we have 

P,*P o: = (P,,«P,,.)*P.: = Py#( Pio*P:) = PyueP ut 

since P,,,*P,, is a version of P(X,| X,). Finally, the P,,’s satisfy the C — K 
equation identically. Suppose r < s < t. Ifs e U then P,, = P,.*P,, by defi- 
nition. If s ¢ U there exists u ¢ U such thatr <u <s < t and, since P,,,«*P,, 
is a version of P(X, | X.), 


P.. = P..( P.4#P,,) = (Pu#P iy.) *P,, = P,P, . 


4. T is countable. Here we impose ne condition on the X,’s, but, guided by 
3, we enlarge the exceptional set N, to obtain absolute continuity to the extent 
needed. For every s define N, as above with U = T and set 


M, = NU U (x: P.(x, N:) > O). 


t>es 


Then P,(M,) = Osince P,(N,) = 0 and fort > s 


0 = P,(N,) = 


v 


/ P,(dx)P,.(a2, Nz) 


which implies P,[x: P..(z, Nz) > 0] = 0. Suppose r < s and x ¢ M,. Then 
P,,(z, M,) = 0; for P,,(z, N.) = O and if t > s, 


0 = P,,(2, N,) <= [ Pala, dy) PCy, N,) 


which implies P,,(z, [y: Ps:(y, Ni) > 0]) = 0. For s < ¢ define 
P,.(2,-) = Pal(z, -) if 2M, 
= P, ifxe M,. 


Since P,(M,) = 0, P,; is a version of P(X,| X,). Suppose r < s < t. Then 
arguing separately for « zg M, and x ¢ M,, we obtain for all z, PA 2, S,) = 
(P..#P..) (2, S,), y rational, so that Pilz, -) = (P*P..)(2, -). Since 
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Paly, -) = Pauly, -) ify eM, and P,,(2, M,) = 0 for all z it follows that P,, = 
PP . 


REFERENCE 
{1] J. L. Doos, Stochastic Processes, John Wiley and Sons, New York, 1953. 
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A GENERALIZATION OF A THEOREM OF BALAKRISHNAN! 
By N. DoNnaLp YLVISAKER? 
New York University 


1. Introduction. Given a stochastic process { X(t), te T} on some probability 
space with second moment kernel 


&[X(s)X(t)] = K(s, t), 
a characterization is given of the function 
m(t) = &X(t). 


This characterization includes the result of Balakrishnan [2] for the case of 
second order stationary, discrete or continuous parameter processes. 


2. The characterization. Let 7 be an abstract set and let K be a positive 
definite kernel on 7 X T. A function m on T is said to be an admissible mean 
value function for the kernel K if there exists a stochastic process { X(t), t e T} 
on some probability space with 


&[X(s)X(t)] = K(s, t) and EX(t) = m(t). 


Lemma 1. m is an admissible mean value function for the kernel K if and only 
af K(s, t) — m(s )m(t) as positive defin ite. 

Proor. if K(s, t) — m(s)m(t) is a positive definite kernel on T X T, let 
{X(t), te T} be a Gaussian process with mean function m and covariance kernel 
K(s, t) -— m(s)m(t), ({3], p.- 72). Then 


e'X(s)X(t)] = &[X(s) —m(s)][X() — m(t)| + m(s)m(t) 
= K(s, t). 
Conversely, if m is admissible, 


E[X(s) — m(s)|[X() — m(t)] = K(s, t) — m(s)m(2) 


is positive definite. 


Received January 4, 1961; revised May 29, 1961. 

1 This research was sponsored by the Office of Naval Research under Contract Number 
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To characterize these functions m, we introduce, for a positive definite kernel 
R on T X T, the corresponding reproducing kernel Hilbert space of functions 
or T, denoted by H(R), the dependence on the set T having been suppressed. 
For a kernel R, H(R) is specified by the conditions 

(1) for every te T, R(-, t) H(R), 

(2) for every te T and fe A(R), (f, R(-, t)) wa) = f(t). 

From these conditions, the following lemma is apparent. 

Lemma 2. Given a function m (4 0) on T, M(s, t) = m(s) mit) is positive 
definite on 7 X T and H(M) consists of all multiples of the function m with 
llan|| cary = 1. 

We appeal finally to the following general theorem given in [1]. 

TuHEoreM 1. Let R and R* be positive definite kernels on T X T. R — R* is 
positive definite if and only if H( R*) C H(R) and for all f ¢ H(R*), 


f H(R*) = f A(R)* 


Returning then to the determination of the functions m for which K(s, t) — 
m(s)m(t) is positive definite on T * T, we have 

Tueoreo 2. If K is a positive definite kernel on T X T, then K(s,t) — m(s)m(t) 
is positive definite if and only if m ¢ H(K) and |\m\\aue) & 1. 

That is, the admissible mean value functions for a given second moment kernel 
K are those functions in the unit sphere of the reproducing kernel space H(K). 

Theorem 1 of Balakrishnan may be seen to coincide with Theorem 2 above 
when K has the representation 


+2 
K(s,t) = k(s ft) = | exp [i(s — t)z] dG(z), —x <s,t< +o. 


Then, according to Theorem 4D of [4], the unit sphere of H(K) consists of 
functions of the form 


m(t) = | exp (itz)u(x) dG(z) 


+2 
Milzcc) = | u(x)\" dG(xz) < 1. 
— oO 
In particular stationary cases, alternative representations are known. Thus, if 
ry 2 / 
K(s, t) = exp [—(s — t)°/2], —-o <8,t< +o, 
the unit sphere of H(K) consists of analytic functions m for which 


0 1 1” 2 
ae a 
— [exp (t°/2)m(t)].0| < 1. 
n=( n! dt” P as 
It should be noted that Theorem 2 applies even to stationary kernels which 
do not possess the spectral representation. 





THE OPINION POOL 1339 


Lastly, a nonstationary example is provided by the Brownian motion kernel. 
For 


K(s, t) = min Ce, 8); 0s3,ts l, 


the unit sphere of H(K) consists of absolutely continuous functions m for which 
m(0) = 0, and 


1 
[ \m’(t)|? dt < 1. 


[1] Aronszasn, N., ‘“Theory of reproducing kernels,’ Trans. Amer. Math. Soc., Vol. 68 
(1950), pp. 337-404. 

[2] BALAKRISHNAN, A. V., ‘On a characterization of covariances,’’ Ann. Math. Stat., Vol. 
30 (1959), pp. 670-675. 

[3] Doos, J. L., Stochastic Processes, New York, John Wiley and Sons, 1953. 

[4] Parzen, E., “Statistical inference on time series by Hilbert space methods, I,’’ Tech. 
tep. No. 23 (NR-042-993) (1959), Appl. Math. and Stat. Lab., Stanford University 
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THE OPINION POOL' 
By M. Stone? 
Princeton University 
1. Introduction and summary. Wher a group of k individuals is required to 
make a joint decision, it occasionally happens that there is agreement on a 
utility function for the problem but that opinions differ on the probabilities of 
the relevant states of nature. When the latter are indexed by a parameter 6, to 
which probability density functions on some measure u(@) may be attributed, 
suppose the k opinions are given by probability density functions pa(@), --- , 
p..(@). Suppose that D is the set of available decisions d and that the utility of 
d, when the state of nature is 6, is u(d, @). 
For a probability density function p(@), write 


uld| p(@)] = | u(a,0)p(0) du(@) 


The Group Minimax Rule of Savage [1] would have the group select that d 
minimising 


MAX ja1,...,4 {MAXgep Uld’ | p.i(O)] — uld | psi(A)]}. 


As Savage remarks ([l], p. 175), this rule is undemocratic in that it depends 
only on the different distributions for @ represented in those put forward by the 


Received May 1, 1961; revised August 7, 1961. 
1 Prepared in connection with research sponsored by the Office of Naval Research. 
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group and not on the number of members of the group supporting each different 
representative. 

An alternative rule for choosing d may be stated as follows: ““Choose weights 
Aso? Me (A £O,¢=—1,---,& and Dah = 1); construct the pooled 
density function 


k 
pa(O) = >. ripai(0); 
1 


choose the d, say d,, , maximising u[d | p.(@)].”’ This rule, which may be called 
tne Goimion Pool, can be made democratic by setting \y = --- = %& = 1/k. 

Where it is reasonable to suppose that there is an actual, operative probability 
distribution, represented by an ‘unknown’ density function p,(@), it is clear 
that the grcup is then acting as if p.(@) were known to be pa(@). If p.(@) were 
known, it would be possible to calculate u[d,, | p.(@)| and u[d,; | pa(@)|, where 
d,; is the d maximising u(d | p,:(@)],7 = 1, --- , k and then to use these quanti- 
ties to assess the effect of adopting the Opinion Pool for any given choice of 
he, ***, De, 

It is of general theoretical interest to examine the conditions under which 


(2.1) ujd.n | pa(@)| = minyey,.... ulds: | pal A). 


Theorems 2.1 and 3.1 provide different sets of sufficient conditions for (1.1) to 
hold. Theorem 2.1 requires k = 2 and places a restriction on p.(6) (or, equiva- 
lently, on pa(@) and p.(@)); Theorem 3.1 puts conditions on D and u(d, 4) 
instead. 


2. The case of k = 2. The following example shows that conditions are needed 
for (1.1) to hold. With k = 2, suppose that pa(@), peo(@), pa(@) are given by 
atoms of probability one on 4, , 4 , 0. respectively, where 6; , 42 , 0, are different: 
also suppose that D has only three elements d; , dz , d; and that 


u(d; 9 6;) = .. u(ds ; 6;) = 0, u(ds : 6;) = 3, 
u(d; , 6) = 0, u(dz, 0) = 1, u(d3, 2) = #, 
u(d;, 0.) = 4, u(d2, 0.) = 3, u(d;, 0.) = 0. 


Then da = @; , dy = de and, for \; = A 
tain. 

However, the following theorem may be stated: 

THEOREM 2.1. If, for some uy, u2, Dal@) = wiypalO) + pepe2(O), then (1.1) 
holds for any weights \, , 2. (As heretofore explicit, the assumption is made 
that da , dss , ds exist.) 

Proor. d,; maximises ufd | p,;(@)], ¢ = 1,2, and d,, maximises uld | pa(@)] 
or Ayuld | pau(@)] + rould | po(6)]. Writing b;; for ufd,; | p.;(0)] — ulda | p.;())], 
it follows that 


2 = 4,d, = d; and (1.1) does not ob- 





THE OPINION 


bu = 0, 
bee = 0, 
Abu + Addo S 
Arbor + Aoboe 
For (1.1) to hold, it is necessary that either 
(2.5) Hyon + wbe S 0 
(2.6) Hide + pode S 0. 


Now it is necessary that u; + we = 1 so that, if uw, S d,, (2.1) and (2.3) imply 
(2.5); while, if w, > A1, (2.2) and (2.4) imply (2.6). Therefore (1.1) holds and 
the theorem is established. 

ExampP.e. If each of p.(@), pa(@), p2(@) is atomic on two 6-points and if 
Pa(9), Peo(@) are not identical, p.(@) may be written as wipa(@) + pepe(O) and 
(1.1) obtains. If pa(@) = pso(@), (1.1) clearly obtains. 


3. The general case. That the condition p,(0) = wypal(O) +--+ + peper(O) 
is not sufficient for (1.1), when k > 2, follows from the following example: 
Suppose that k = 3 and that p,;(@) is given by an atom of probability one at 
6 = 6; for i = 1, 2,3 where 6, , 6, 6; are different; also suppose that D has 
only four elements do , d; , dz , d; for which 


u(do, i) = #, u(d; , 0) = 2%, u(d,, 6) = 4, u(d3, 6) = 4 


4) 


u(do ; Os ) = 3, u(d, . 62) = i u(de , 62) = 23, u(ds , 65) = 1 


’ 


u(do, 03) = 0, u(d, , 03) = 4, u(dz, 0;) = 4, u(ds, 63) = 24. 


Choose a small positive number e. Suppose [u: , we, us| is such that p.(6) is 
atomic on [6; , 42 , 42] with 


[pal 41), Pal 62), Pa( 43)] = ($1 e $e), (1 ie $e), 3(1 + €)}. 
Take [A; , Ax, As] so that p.(@) is atomic on [6; , 4 , 93] with 
[per(1), Par G2), Par(Os)] = [3(1 + $e), 3(1 + $e), $(1 — €)]. 


Then uldo|pa(@)] = 1 + fe, uldi| pal] = uld,| pa(0)] = 1 + 9e/24, 
ulds | pa(@)| = 1 — 3e/4; whence dy = do. Also, by symmetry, ul[do | p.(@)] = 
1 — he, uld; | pa(@)] = ulde | p.(@)] = 1 — 9e/24, ulds| pa(@)] = 1 + 3e/4; 
whence 


ulds | pa(O)] = uldo| pa(@)| < min {uld,, | pa{@)] | ¢ = 1, 2, 3} 


so that (1.1) does not hold. 
Theorem 2.1 gives conditions cn k and p,(@) for (1.1) to obtain. The follow- 
ing theorem gives conditions on only D and u(d, @) for (1.1) to obtain. 
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TueoreM 3.1. If (i) D is an interval of real numbers (ii) —u(d, 6) is, for each 
0, a strictly convex function of d then (1.1) holds for all weights \; , --- , x . (The 
assumption is made that dy, --- , dex , da exist.) 

Proor. Consider any three different elements d,, d2,d; of D such that 
d, = pd, + (1 — p)d;, 0 < p < 1. Then, for all 6, u(d,, 0) > pu(d., 0) + 
(1 — p)u(d;, 6) and hence uld, | p(@)] > puld, | p(@)] + (1 — p)uld; | p(@)). 
Therefore —u|d | p.(@)], —uld | p.:(@)], 7 = 1, --- , k, are strictly convex in d. 
Let d,, = min {dy , --- , ds} anddy = max {dy, --- , dx}. Ford, = d S dy, 
by the convexity of —uld | p.(@)], 


(3.1) uld | pa(@)] = min {uld, | pa(O)], ulda | pa()]}. 
Hence 
(3.2) Min ja1,...,4 U[ds: | Pal @)] = min {uld, | pa(O)], wld | pal A)]}. 


For weights \,, --:,%, if dn S dy S dw, (3.1) and (3.2) together imply 
(1.1). However, if d < d, , there exists a d* e¢ D and p7,0 < pi < 1,i = 1, 

- , k, such that d, < d* < d,, and d* = pidy + (1 — pi)dx,i = 1,---,k. 
By the established strict convexities, 


uld* | pei(O)] > prulde | pei(O)] + (1 — pt )ulds: | pei(0)] 
piulde | pei(O)] + (1 — pf )ulda | pei()] 


ulds | pei(@)], 
whence >-i dyuld* | pei(@)] > DoF Auld | pei(O)] or 


uld* | pa(0)] > uldaa | pal], 


a contradiction. Hence d, < d, is impossible; and so isdy < dj . Therefore the 
theorem is established. 
Examp.e. D is an interval, @ is a real parameter and u(d, 6) = —(d — @)’. 
Because (d — 6)’ is strictly convex in d for each 6, (1.1) obtains. 
In conclusion, it may be noted that it is quite possible to have 
uld.n | pa(@)] > max {uld,; | pa(@)] |i = 1, --- , kh. 
For example, this will occur (for all but degenerate cases) when 


Pa(9) = > Mipsi( @) 
and \; = #i,%7=1,---,k 
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CORRECTION NOTES AND ACKNOWL- 
EDGMENTS OF PRIORITY 


CORRECTIONS TO 
“STATISTICAL METHODS IN MARKOV CHAINS” 
By Patrick BILLINGSLEY 
The University of Chicago 
E. S. Keeping has pointed out a mistake in the proof of Theorem 2.1 of the 
above-titled paper (Ann. Math. Stat., Vol. 32 (1961), pp. 12-40). The error can 
be corrected by making the following changes. Replace the display preceding 
(2.4) by 
Ni (F) = Dow NLT (F(w, »)), 

and in the folowing line replace f.. > 0 by fu» > 0. Replace (2.4) by 

Fe. = Dvledo: Feu(w, v). 
In line 1, page 15, replace F* (u.->) by F* (w,v), “column” by “row’’, and 

Feo(u, w) = Few 


by Fe,.(w,v) = F%,, . In line 3, replace pF nl by Pee in two places. 
Misprints: p. 13, line 27, for 1 read pj; ; p. 22, line 19, for 7 read 1; a factor of 


2 is missing on the right in the first display on p. 26, and on the left in (5.4), 
(5.5), and (5.6). 


CORRECTION TO 
“A CONSERVATIVE PROPERTY OF BINOMIAL TESTS” 
By H. A. Davin 
Virginia Polytechnic Institute 


In the proof of inequality (1) of the above note (Ann. Math. Stat., Vol. 31 
(1960), pp. 1205-1207) it is tacitly assumed, near equation (4), that P can be a 
maximum only if Pr (S,-; 2 a,) is a maximum for any given 7, . I am indebted 
to Dr. W. Hoeffding for pointing this out. His proof, cited in my note, establishes 
the inequality without such an assumption. 


EE EE 
ACKNOWLEDGMENT OF PRIORITY 
By V. P. GopamMBE 
Science College, Nagpur, India 


In connection with my article “An optimum property of maximum likelihood 
estimation” (Ann. Math. Svat., Vol. 31 (1960), pp. 1208-1211), I wish to ac- 
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knowledge that Professor 8. S. Wilks proved a special case of my theorem in his 
article ‘Shortest average confidence intervals from large samples,” (Ann. Math. 
Stat., Vol. 9 (1938), p. 172), which was overlooked during my research. 


a 


CORRECTION TO 
“A PROOF OF WALD’S THEOREM ON CUMULATIVE SUMS” 
By N. L. JoHNson 
University College, London 


The following correction should be made in the above-titled article (Ann. 
Math. Stat., Vol. 30 (1959), pp. 1245-7): On p. 1245, &(n) < « should be added 
to condition (iii) of Theorem 1. 


or 


CORRECTION TO 
“ON THE MUTUAL INDEPENDENCE OF CERTAIN STATISTICS” 
By C. G. Kuatri 
University of Baroda 
I am indebted to Robert Wijsman for calling my attention to the following 
misprints in the above mentioned paper (Ann. Math. Stat., Vol. 30 (1959), pp. 
1258-1262). 
(i) Page 1258, the last two lines of (2.2) should read “the elements below the 
principal diagonals are 


| Ajs|/(| Ass] -| Asaa])?, 


( 8u 


and the vertical bars on both sides of a matrix denote the determinant of that 
matrix.” 

(ii) Page 1259, the reference of (2.7) should be “Ingram Olkin, Institute of 
Statistics Mimeographed Series No. 43 (1951), p. 74, Corollary 1.30” instead 
of [3]. 

(iii) Page 1259, replace the line after (2.8) by “where dX = [Jj; dz; and 
dS = []iz; ds;;.” 

(iv) Page 1259, Section 3, line 4, replace the last S by S, . 

(v) Page 1259, Section 3, line 5, the matrix J — ZZ’ should be in vertical 
bars rather than in parentheses. 
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CORRECTION AND ACKNOWLEDGMENT OF PRIORITY TO 
“FIRST PASSAGE TIMES FOR A GENERALIZED RANDOM WALK” 
By Joun R. Kinney 
Massachusetts Institute of Technology 


The following correction should be made in the above-titled article (Ann. 
Math. Stat., Vol. 32 (1961), pp. 235-243). The equations on line 13, p. 239 should 
read 


L"M = L"MH+K so G* = G*H + K. 


At the time the above paper was published the author was unaware that 
Theorem 2 was a version of Wald’s Identity as shown in the following reference 
(A. Wald, ‘““On cumulative sums of random variables,’ Ann. Math. Stat., Vol. 
15 (1944), pp. 283-296). 

scrinagusseiiaaiaciadilitens 


CORRECTION TO 


“TABLES OF EXPECTED VALUES OF ORDER STATISTICS AND 
PRODUCTS OF ORDER STATISTICS FOR SAMPLES OF SIZE 
TWENTY AND LESS FROM THE NORMAL DISTRIBUTION” 


Epirep By D. TreicHroew 
Stanford University 


The following correction should be made in the above-titled paper (Ann. Math. 
Stat., Vol. 27 (1956), pp. 410-426). On p. 416, for N = 10,7 = 3, the last digit 
of the expectation should be 4, not 7; thus the expression should read .65605 
91054. 





ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Central Regional Meeting of the Institute, Urbana, 
Illinois, November 24-25, 1961. Additional abstracts will appear in the 
March, 1962 issue.) 


1. Admissibility of the Optimal Invariant Estimate for a Translation Parameter 
Under Absolute Error Loss Function. Martin Fox anp HERMAN RUBIN, 
Michigan State University. 


Let P satisfy the conditions given by Stein (Ann. Math. Stat., Vol. 30, pp. 970-979) with 
the following changes: (i) replace Stein’s condition (2.6) with the condition 


f dv(y) J zx? d,P(z, y) <« 


and (ii) add the condition that the unique median of P(-, y) be at 0 for each y e Y. Then, 
with absolute error loss function, z is an almost admissible estimate of the translation 
parameter @ where (X — 6, Y) has the joint distribution P. The proof is similar to Stein’s 
but somewhat more intricate. Furthermore, under the assumption that p(-, y) is a density 
for each y €‘Y, Stein’s proof of admissibility goes through. An example shows that almost 
admissibility is the best that can be obtained without a density. Farrell (Ithaca meeting, 
April 20-22, 1961) has shown that, if the median is nonunique, then there is no admissible 
estimate. The results stated above are still valid if the loss function is weighted by a if the 
eotimate is to the left of 6 and by b otherwise. The condition on the median is replaced by 
the same condition for the (1 — a)th quantile where a = a/(a + b). 


2. Unbiased Estimation of Probability Densities (Preliminary report). 8. G. 
GuuRyYE, University of Minnesota. 


Let y = (y1, «++ , yn) be asample from a k-dimensional population P, which is an element 
of a family, ®&, of distributions. Let g(z) be a known numerical-valued function on R, with 
finite expectation wp = Se, gdP, for all Pe@. It is desired to find an unbiased estimate 
$(y) of wp . If P is a dominated family with respective probability densities fp(x) relative 
to a known measure yw on the k-dimensional Borel sets, then the problem is equivalent to 
that of finding ¢(z, y) satisfying Epe(zx, y) = fp(x) forall ze R,, P ¢ P. Anumber of special 
cases of the problem have been treated previously |[Kolmogorov, [zvest. Akad. Nauk SSSR, 
Ser. Mat. (1950); Lieberman and Resnikoff, J. Amer. Stat. Assoc. (1955); Washio, Morimoto 
and Ikeda, Bull. Math. Stat. (1956); Schmetterer, Ann. Math. Stat. (1960)]. We give a de- 
tailed discussion for many families of densities, and also consider certain functions of 
densities. 


3. On the Resolution of Statistical Hypotheses. Ropert V. Hoaa, University 
of Iowa. 


Let wo be the space of a parameter 6. Let w; be a subset of wj_; ,7 = 1, 2,--- , k. We test 
6 € w, against 0 & wo — we by testing iteratively the following hypotheses: @ € w; against 
O€wi1 — wi,t = 1,2, ---, k. The hypothesis @¢w,; is accepted if and only if each inter- 
mediate hypothesis is accepted. If the test statistic for each intermediate hypothesis 
6 €; is based on the corresponding likelihood ratio \;, we demonstrate why, under fairly 
general conditions, these test statistics are mutually stochastically independent. This 
argument is based on an independence theorem which deals with complete sufficient sta- 
tistics. 


1346 





ABSTRACTS 1347 


$. A 3(2°*) Design of Resolution V. Perer W. M. Joun, University of Cali- 
fornia, Davis. (By title) 


The sma!lest 2*-” fraction of the 2° design of resolution V (main effects and two factor 
interactions clear) is the quarter replicate, involving 64 points. A three-sixteenth replicate, 
48 points, of resolution V is obtained, in which each of the main effects and two factor inter- 
actions is estimable from at least a combination of two of the sixteenths. In the three- 
quarter replicate of the 2° design, 3 (2°), obtained by omitting the quarter replicated 
defined by J = ABC = DEF = ABCDEF, put G = ABDE and H = ACDF to form the 
3 (2&-*) design. The three sixteenths may be combined in pairs to give the following eighth 
replicates from which the desired estimates are obtained. They are the fractions whose 
defining contrast subgroups are generated by (i) ABC, ABDEG, ACDFH;; (ii) DEF, ABDEG, 
ACDFH; (iii) ABCDEF, ABDEG, ACDFH. 


5. The Bivariate Chi Distribution. P. R. KrisHnaran, Perer Haais, Jr. anp 
LEON STEINBERG, Remington Rand Univac, Blue Bell, Penna. 


Consider a vector X; = (Xi; , X2;) of two random variables whose joint distribution is 
the bivariate normal with mean vector u = (yu: , uw) and covariance matrix 


> a; po; Ge 
pa, a 03 
2,2 2 2 i ° . ‘ 

If we let x: = [03 (Xi;/oi)}' and x2 = [S03 (X%,/e2)}*, then the joint distribution of 
xi and x2 is called the central or non-central bivariate chi distribution according to whether 
u = 0 or uw ~ 0. In the present paper, some properties of this distribution are discussed. 
Also, a test is proposed to test for the homogeneity of mean lives of machines when the 
failure times follow a bivariate chi or chi-square distribution with known correlation. The 
monotonicity property of the power of this test is established and extensive tables are con- 


structed for use in applications of the test. Applications of the test in areas other than life 
testing are also discussed. 


6. On the Efficiency of Optimum Nonparametric Procedures in the Two 
Sample Case. P. W. Mixuusk1, University of California, Berkeley. 


Consider the hypothesis that two samples are drawn from two populations with the same, 
continuous, completely specified distribution F. The alternative is that the distribution in 
the second population is shifted to the right. For testing this hypothesis consider: (a) the 
locally most powerful rank test, and (b) the locally asymptotically optimal test (J. Ney- 
man: “Optimal Asymptotic Tests of Composite Statistical Hypotheses, H. Cramer Volume, 
1959), or the asymptotically equivalent likelihood ratio test. The question is investigated, 
how the Pitman efficiency e(F; ¥) of the optimal rank test to the optimal parametric test 
behaves, if the true distribution ¥ departs from the assumed distribution F. Chernoff and 
Savage have shown that if F is normal, then e(F; ¥) = 1 for all v. It turns out that under 
certain regularity restrictions for the assumed distribution, normality of F is a necessary 
condition for the inequality e(F; ¥) = 1 to hold for all WV. In particular if the logarithmic 
derivative of the density of F is bounded and satisfies additional regularity conditions, then 
for every « > 0 there exists a true distribution V such that e(F; ¥) < e. If however W differs 
from F only by location and scale parameters then e(F; ¥) = 1 with strict inequality holding 
unless either W = F or F is normal. 
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7. Group-Screening with More Than Two Stages. M. 8. Pater, Research 
Triangle Institute. 


The two stage group-screening procedure described earlier by W. S. Connor (Cf., The 
Proceedings of the Sixth Conference on the Design of Experiments in Army Research 
Development and Testing) and G. 8. Watson (A study of the Group-screening method, 
August issue of Technometrics, 1961) is extended to multiple stages for the case when re- 
sponses are observed with negligible error. Because of their potentia! usefulness, three and 
four stage procedures are treated in some detail. The general n + 1 stage procedure is de- 
fined, a furmula is developed for the expected number of runs, and for a fixed number of 
factors is minimized with respect to the sizes of the group-factors at various stages. Finally 
the procedures at different stages are compared with respect to the minimum expected num- 
ber of runs. 


(Abstracts of papers presented at the Eastern Regional Meeting of the Institute, 
New York City, December 27-30, 1961. Additional abstracts will appear in 
the March, 1962 issue.) 


1. Extensions of the Arc Sine Law. Simeon M. Berman, Columbia University. 


An are sine law for the number of positive partial sums in a sequence of ‘“‘symmetrically 
dependent”’ random variables is obtained by means of the de Finetti representation theo- 
rem; this are sine distribution is more general than that obtained by E. 8. Andersen (1954) 
and holds under wider conditions. A secondary result of the paper is a direct generalization 
of an arc sine law of D. A. Darling (1951) for sums of independent random variables. 


2. Application of Simultaneous Confidence Intervals to Two Regression Prob- 
lems (Preliminary report). ArrHuR ConEen, Columbia University. 


Consider the general linear hypothesis of full rank; that is, let y = Xb + u where y is an 
n X 1 vector of observations, X is a fixed n X p matrix of rank p, bisa p X 1 vector of 
parameters, and u is an n X 1 vector which is multivariate normal with mean vector zero 
and covariance matrix o?J. H. Scheffé has shown how to obtain simultaneous confidence 
intervals for any number of estimable functions. His result is used to show how to obtain 
simultaneous confidence intervals for any number of parameters which are the ratios of 
linear combinations of the parameters in b. This latter result is applied to the multiple 
bioassay problem. 

J. Mandel (Ann. Math. Stat., Vol. 29 (1958), pp. 903-907) has shown how one might obtain 
simultaneous confidence intervals for any number of any real functions of the parameters 
in b. His result, along with the above mentioned result on the ratios of estimable functions, 
is used to test whether one quadratic regression function lies uniformly above another 
quadratic regression function over any given interval of abscissae. 


3. Contributions to the “Two-Armed Bandit” Problem. Dorian FeLpMan, 
Michigan State University. 


The Bayes sequential design is obtained for an optimization problem involving the choice 
of experiments. Given are experiments A, B, densities p: , pz , a positive integer N (which 
may be ~) anda number é ¢ [0, 1]. A sequence of N observations is to be made such that at 
each stage either A or B is observed, the loss being 1 if the experiment with density pz is 
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chosen, 0 otherwise. é is the prior probability that A has density p; and the risk of a pro- 
cedure is the expected number of observations on the experiment with density pe , given £. 
Let R4(é), RE (E) denote the risks of the procedures that choose A first, respectively B first, 
and follow the optimal procedure for the last N — 1 trials. It is shown inductively that for 
all N the difference between these risks is monotone in é and this is equivalent to optimality 
of the following procedure: At stage i + 1 (regardless of N) choose A or B according as 
& 2 4 or & S 4. & isthe posterior probability that A has p, . For N = «,the risk of this 
procedure is shown to be finite (hence optimal) and some specific risk functions are com- 
puted for binomial experiments. 


4. On the Axioms of Sample Formation and Their Bearing on the Construction 
of Linear Estimators in Sampling Theory for Finite Populations. J. C. 
Koop, North Carolina State College. 


Consider a universe (or population) of N elements described by a frame which in this case 
will be a simple list. In drawing a sample according to any probability system defined for the 
selection of the elements one at a time, either with or without replacement, three features 
inherent in the nature of the process of selection are evident. They are as follows: (i) the 
order of appearance of the elements, (ii) the presence or absence of any given element in the 
sample which is a member of the universe or population, and (iii) the set of elements com- 
posing the sample considered as one of the total number possible (in repeated sampling) 
according to the given probability system. As the veracity of the statements at (i), (ii) and 
(iii) is self-evident, they may be designated as axioms. It will be noted that the statements 
are not mutually contradictory. These features, inherent in the actual process of selection 
and as a result sample formation, supply the bases for the construction of linear estimators 
on a deductive basis. The axioms, which are implicit in the work of Hervitz and Thompson 
(Ann. Math. Stat., Vol. 22 (1951), p. 315), considered singly, two at a time and most gen- 
erally all three together, result in seven very general classes of linear estimators. The exten- 
sion of the application of these axioms to samples from a universe with subdivisions (strata, 
first-stage units, second-stage units, etc.) is almost immediate. 


5. Maximum Likelihood Estimates for Certain Contagious Distributions Using 


High Speed Computers. Donatp C. Martin anp 8. K. Kart, Florida 
State University. 


Fortransit programs for fitting the Neyman Type A, the Poisson with Zeros and the 
simple Poisson, the Negative Binomial, and the Poisson Binomial by the method of maxi- 
mum likelihood are available from the Statistics Department of the Florida State Uni- 
versity. These programs have five operational modes allowing for combinations of the 
following: (i) Computing moment estimates. (ii) Using moment estimates or reading the 
initial parameter estimates and computing the maximum likelihood estimates by an itera- 
tive scheme. (iii) Reading in estimates computed by other means and using these to compute 
probabilities, expected frequencies and chi-square values, thus bypassing the maximum 
likelihood estimation process. (iv) Computing the probabilities, cumulative probabilities, 
expected frequencies, and term by term chi-square values and (v) Computing the chi- 
square value with some rudimentary grouping or the likelihood value. The chi-square 
section of the routine groups all expected frequencies less than a constant, usually 5, into a 
single cell. The Poisson is included as a special case of the Poisson with Zeros routine. 

All programs are written in Fortransit II’s for an IBM 650 computer with special charac- 
ters. Running times vary widely between routines and data with the longest time on the 
order of 20 minutes and the shortest less than one minute. Typical times range from 1 to 10 
minutes per distribution. An object program deck, a Fortransit statement deck, running 
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instructions and test data will be provided on request. All inquiries should be addressed to: 
The Department of Statistics, Florida State University, Tallahassee, Florida, attention 
S. K. Katti. 


6. Asymptotic Efficiency of a Class of c-Sample Tests. M. L. Puri, University 
of California, Berkeley. (By title) 


For testing the equality of c continuous probability distributions on the basis of c inde- 
pendent random samples, the test statistics of the form L = > j-1 m; | (Tw.3 — un.3)/An |? 
are considered. Here m; is the size of the jth sample, uy,; and Ay are normalizing constants 
and T'y,; = x Ey :Z; where Z3}’; = 1 if the ith smallest of N = > ju m; observations 
is from the jth sample and Z,’; = 0 otherwise: and Ey,; are given numbers. Generalizing 
results of Chernoff-Savage (Ann. Math. Stat., Vol. 29 (1958), pp. 972-994), sufficient con- 
ditions are given for the joint asymptotic normality of Ty,; ;j7 = 1, --- , c. Under suitable 
regularity conditions and the assumption that the ith distribution function is 


F(z +(6;/N)) 


it is shown that as N > ~, the statistic L has a limiting noncentral chi-squared distribu- 
tion. The asymptotic relative efficiency in Pitman’s sense of the L test and the Kruskal- 
Wallis H test (which is a particular case of the L test) is obtained and is shown to be 
independent of c. 


7. Some Statistical and Operational Techniques in Reliability Studies (Pre- 
liminary report). A. S. QuretsH1, IBM Service Bureau Corp., San Jose, 
California. 


Given two production processes, the units from which fail in accordance with the Weibull 
Distribution, the problem is to select the particular process with the Smallest failure rate. 
It is assumed that there is a common guarantee period (known or unknown) during which 
no failures occur. It is desired to accomplish the above goal in as short a time as possible, 
thus maximizing the total gain without invalidating certain predetermined probability 
specifications. Three techniques, as suggested by Sobel (Bell System, Vol. 35 (1956) pp. 
179-202) are considered and three procedures are constructed to show their advantages and 
disadvantages. Sobel’s results on the assumption of exponential distribution turn out to be a 
particular case of the general solution presented in this paper. Expressions for average ex- 
periment time required to terminate the experiment have been obtained and the efficiencies 
oi different procedures are compared. An alternative sequential procedure has been pro- 
posed and shown to be valuable in some situations. It has been shown with the aid of the 
tables (computed on Burroughs 220), that in some situations, one saves time and money if 
he assumes Weibull Distribution. Techniques have been developed to find the optimal 
sample size for each of the four procedures. Minimum Regret Principle has been applied to 
select the best procedure. The author is highly indebted to Professor N. L. Johnson, under 
whose Supervision this research is being carried. 


8. Asymptotic Bounds for the Zero-Crossing Probability Distribution of Sta- 
tionary Gaussian Processes. M. Rosensiatt, Brown University. 


Let X(t) be a separable stationary Gaussian process with mean zero and covariance 
function r(t) = ELX (7) X (7 + t)|. Let G(T) = P[X (t) > 0,0 St Ss T). Assume that r(t) — 0 
as t— «. It is then shown that G(7’) approaches zero faster than any inverse power of T 
is T — «. Stronger bounds are obtained for specific rates of decay of r(t) such as r(i) ~ 
t-*,a >0,ast— ~. The basic tool is a powerful inequality of D. Slepian. 





ABSTRACTS 
(Abstracts of papers not presented at any meeting of the Institute) 


1. A Contribution to the Sphere-Packing Problem of Communication Theory. 
A. V. BALAKRISHNAN, University of California, Los Angeles. (By title) 


It is shown that the ‘‘sphere packing”’ problem (optimal band-limited signal selection 
for coherent Gaussian channels) can be reduced to the following extremal problem: “Given 
an N-variate Gaussian, §,-+-- ,&y, with zero means and unit variances, maximize 
E(e*™*i fi) for > 0, with or without restriction on the rank m of the covariance matrix.” 
It is shown that for a given m if there is a solution independent of \, then this is given by 
maximizing the mean-width of the polyhedron generated by N unit vectors in Euclidean 
m-space. With no restriction on m, m S N, it is shown that if there is a nonzero \-interval 
for which the solution is independent of \, this solution is given by the regular simplex in 


(N — 1) dimensions. Additional results using generalized tetrachoric series are given for 
the general problem. 


2. Some Aspects of Statistical Invariance. Davip R. Britiincer, Bell Telephone 
Labs, Murray Hill, N. J. (By title) 


Necessary and sufficient conditions are presented for a statistical problem to be invariant 
under a Lie transformation group. For the case of a (multi-dimensional ) real-valued random 
variable x with c.d.f. F(z, 6) the conditions reduce to: 

(i) there must exist analytic functions y?% (0), or o7 (x) such that, 

: OF OF 7 : 
X:\F = >> yt) — + D> o}(z) —— =0, for all i 
a Y 


06 Ox’ 


and 

(ii) the infinitesimal generators X; generate a group. The paper continues with a theorem 
concerning the distribution of statistics that are the maximal invariants of a compact 
topological transformation group. The theorem generalizes the technique that James has 
been using to derive some multi-dimensional distributions. The paper concludes with the 
following theorem justifying the ‘“‘re-use’’ of samples: consider a random variable z with 
probability measure P. Let G be a set of measure preserving transformations of P. Let 
g(x) be an unbiased estimator of a, then Se ¢g(gx) du(g) is also an unbiased estimator of a 
and has smaller loss for any real valued convex loss function where u(g) is any measure of 
total mass 1 on G. 


3. The 5-Method for Banach Valued Random Variables. Davip R. BRILLINGER, 
Bell Telephone Labs, Murray Hill, N. J. (By title) 


The ‘‘é6-method”’ or method of “propagation of error’’ is extended so that it may be used 
in deriving the asymptotic distribution of Banach valued functions of Banach valued 
random variables. Define plim z, = 6, @ the 0 element of the Banach space, if for every 
e > Olim P(j\z,|| S «) = 1. Define zlim 2, = 6, if for every « > Othere exists an A, 
such that P(|\z,|| S A.) 2 1 — ¢ for all n. Theorems proved include: 

(1) Let x, , yn be sequences of Banach valued random variables with associated measures 
tn, vn On X. Let {v,} converge weakly to a probability measure yz. If plim (z, — yn) = 9, 
then {u,} converges weakly to uz. 

(2) Let z, , x induce measures uw, , w un X. Let uw» converge weakly to uw. Let g:X — Y 
be a continuous map of X onto the Banach space Y. This map induces measures {pp}, v 
such that », converges weakly to ». 
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(3) Let {A,} be a sequence of scalars such that |A,| — ©. Let g:X — Y possess a Frechet 
differential everywhere in X. If lim An(@%, — yn) = 6 then plim A,[g(z,.) — glyn) — 
dG (yn , hn)] = 0 where hn = In — Yn. 


4. Some Fiducial Examples. Davin R. Britiincer, Bell Telephone Labs, 
Murray Hill, N. J. (By title) 


This paper presents three examples with fiducial relevance. The first example concerns 
the definition of joint fiducial distributions. Quotes are given from works of Fisher concern- 
ing the genuine fiducial argument. An example is given that appears to satisfy all of 
Fisher’s requirements, but yet that doesn’t lead to a unique fiducial distribution. The 
second example demonstrates that recent results of Lindley concerning the identity of a 
fiducial distribution and a Bayes’ posteriori distribution cannot be extended to higher di- 
mensions in the obvious manner. The third example is the following: let x be N (u, 1). The 
fiducial distribution of u? may be derived in two manners, firstly from the fiducial distribu- 
tion of uw by means of a Jacobian multiplier, secondly from the fact that z? is non-central 
x? with parameter yu”. The results are different. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


Richard Lee Beatty, Assistant Professor at the University of Wyoming, is 
presently on leave for a year of study at Stanford. 

William H. Beyer completed his Ph.D. in Statistics at Virginia Polytechnic 
Institute in July, 1961, whereupon he joined the staff of the Department of 
Mathematics at the University of Akron (Ohio). He was formerly a member of 
the staff of the Department of Mathematics, V.P.I. 

B. Ramdad Bhat has completed his Ph.D. in Statistics at the University of 
California, Berkeley and will be spending the academic year 1961-62 at Michigan 
State University as Assistant Professor of Statistics. Dr. Bhat is on leave from 
Karnatak University, Dharwar, India. 

Professor R. C. Bose of the Department of Statistics at the University of 
North Carolina, has been granted leave to carry on research in Europe during 
the academic year 1961-62. 

Mr. Victor Chew has accepted a part time position in the Department of 
Biostatistics of the Johns Hopkin: University. He will divide his time between 
the Dr. David Duncan research project and the Operations Research Branch of 
the U. S. Naval Weapons Laboratory at Dahlgren, Virginia, where he is a 
mathematical statistician. 

Dr. Edwin L. Crow, Consultant in Statistics at the Boulder Laboratories, 
National Bureau of Standards, is studying during the 1961-62 academic year 
at the Department of Statistics, University College, Gower Street, London, 
W.C.1, England. 

Shanti S. Gupta, Member of the Technical Staff of the Bell Telephone Lab- 
oratories and Adjunct Professor of Mathematics at New York University, is 
on leave during the academic year 1961-62 and has been appointed Visiting 
Associate Professor in the Department of Statistics at Stanford University. 

Francisco Azorin Poch, formerly of the Universidad Central de Venezuela 
teaching staff, has been appointed Professor of the Faculty of Science at the 
University of Santiago, Spain, beginning October Ist. 

A. Salem Qureishi has joined the staff of IBM’s subsidiary, San Jose Data 
Processing Center, as a statistician. 

D. Raghavarao has completed his Ph.D. at the Bombay University. 

Dr. Harry G. Romig, previously senior scientist and staff member of Opera- 
tions Research, Inc., has been named corporate director of quality engineering 
of Leach Corporation. 

Professor Dudley E. South, University of Florida, has accepted an appoint- 
ment as visiting Professor of Mathematics, Florida Presbyterian College, St. 
Petersburg, for the academic year 1961-62. 
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New Members 
The following persons have been elected to membership in the Institute 


Armstrong, Robert K., B.A. (Boston University); Quality Engineer, Minneapolis/Honey- 
well, Commercial Division. 5th and 5th North, Minneapolis, Minnesota. 

Bakhit, Osman B., B.Sc. (University of Kartoum, India); Graduate Student, Biometrics 
Unit, School of Agriculture, Cornell University, Ithaca, New York; 224 Linden Avenue, 
Ithaca, New York. 

Blachman, Nelson M., Ph.D. (Harvard University) ; Consultant on Communication Theory, 
Electronic Defense Laboratories, Sylvania Electric Products, Inc., Box 205, Mountain 
View, California; 443 Ferne Avenue, Palo Alto, California. 

Bohider, Neeti R., Ph.D. (Iowa State University); Assistant Professor of Statistics, Utah 
State University, Logan, Utah. 

Chanda, Kamal C., Ph.D. (University of Manchester, England); Assistant Professor of 
Mathematics, University of Idaho, Moscow, Idaho. 

Elliott, Antony G. L., B.Sc. (University of Western Australia); Lecturer in Mathematical 
Statistics, University of New South Wales; Post Office Box 1, Kensington, NSW.; 1 
Yanko Road, Pymble, NSW. 

Farrell, Edward J., B.S., (University of Minnesota); Student at University of Minnesota; 
Remington Rand Univac, Univac Park, St. Paul 16, Minnesota; 1370 Berkeley Avenue, 
St. Paul 5, Minnesota. 

Foster, Louis A., M.S., (Purdue University); Graduate Student, Statistical Laboratory, 
Purdue University, Lafayette, Indiana. 

Giri, Narayan C., Ph.D. (Stanford University); Teaching Assistant, Statistics Department, 
Stanford University, Stanford, California. 

Glass, Donald N., B.S., (University of Wyoming); Chief of Computing, Computing Labo- 
ratory, University of Wyoming, Department of Statistics; 463 North 10th Street, Laramie, 
Wyoming. 

Godfrey, Milton L., B.M.E., (New York University); Director of Applied Sciences Division, 
CIER, 270 Park Avenue, New York 17, New York. 

Goldfab, Jay, M.B.A., (University of Pennsylvania); Operations Research Analyst, Johns- 
ville, Pennsylvania. 

Gopalan, M. N., M.Sc., (University of Mysore, India); Senior Technical Analyst, Indian 
Institute of Technology, Department of Mathematics, Post Office Box 11T, Powai, 
Bombay 76, India. 

Gray, Kenneth B. Jr., M.S., (Stanford University); Graduate Research Mathematician, 
U.C.L.A., Los Angeles 24, California; 3100 Sawtelle Boulevard, Los Angeles 66, California. 

Kern, Richard N., M.S. (St. Louis University); Supervisory Mathematician, National 
Security Agency, Fort George Meade; 904 Kenbrook Court, Silver Spring, Maryland. 

Kinney, John J., M.S. (University of Michigan); Assistant Professor of Mathematics, St. 
Lawrence University, Canton, New York; 55 Perine Street, Dansville, New York. 

Klebba, A. Joan, M.A. (Catholic University of America); Staff Statistician, National 
Office of Vital Statistics; 1469 South 28th Street, Arlington, 6, Virginia. 

Lowenstein, Regina S., A.M., (Columbia University); Statistician, Research Unit, 
Columbia University School of Public Health and Administrative Medicine, 21 Audu- 
bon Avenue, New York 33, New York; 33 West 68rd Street, New York 23, New York. 

Nederlof, Marinus Herman, Ph.D., (Leiden State University, The Netherlands); Office 
Geologist for Compania Petrolera Boliviana Shell Ltda., Casilla de Correos 20, La 
Paz, Bolivia; c/o Ir. O. B. Blosma, Graaf Jan laan 11, Naarden, The Netherlands. 

Pate, James L., M.A. (University of Alabama); Instructor, University of Alabama; Bor 
4211, University, Alabama 

Pessin, Vivian, M.A. (Columbia University); Statistician, Childrens Hospital, 2/9 Bryant 
Street, Buffalo 22, New York 
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Peugh, Laura V., M.S. (Purdue University); Mathematician, David Toller Model Basin, 
Carderock, Maryland. 

Powell, Joe L., M.S. (Tulane University); Assistant Director, Biomathematics Labo- 
ratory, Baylor University College of Medicine, Texas Medical Center, Houston, Texas; 
8245 Park Place Boulevard, Apt. 6, Houston 17, Texas. 

Reyment, Richard A., D.Sc. (University of Stockholm); Docent at the University of Stock- 
holm, Department of Geology, Kungstengatan 45, Stockholm Va., Sweden. 

Rhode, Charles A., B.S. (Case Institute of Technology, Cleveland, Ohio); Graduate 
Assistant, Institute of Statistics, North Carolina State College, Raleigh, North Carolina. 

Rowe, Peter M., M.B.A., (New York University) ; Student, New York University Graduate 
School of Arts and Sciences, Washington Square, New York; 2/1 Alpine Drive, Closter, 
New Jersey. 

Sarndal, Carl-Erik, Fil.Lic. (University of Lund, Sweden); Post-doctoral fellow, Depart- 
ment of Biostatistics, University of North Carolina, Chapel Hill, North Carolina. 

Schafer, Ray E., M.B.A. (Western Reserve, Cleveland, Ohio); MTS Statistician, Hughes 
Aircraft Corporation, Ground Systems Group, Box 2097, Fullerton, California, Mail 
Station E 116, Building 393; 11652 Stuart Drive, Garden Grove, California. 

Sellheim, H. Dale, B.S., (University of North Dakota, Grand Forks); Senior Mathe- 
matician, C-E-I-R, 1200 Jefferson Davis Highway, Arlington 2, Virginia; 6713 14th 
Street N.W., Washington, D.C. 

Seri, Armand, M.S. (University of Wichita); Graduate Student, University of Illinois, 
Urbana; 905 South First Street, Apt. 24, Champaign, Illinois. 

Tantuco, Guillermo N., M.A., (University of Michigan); Tantuco Enterprises, 311 Ayala 
Building, Juan Luna Street, Manila, Philippines. 

Tierney, Thomas R., B.S., (Temple University); Student; 1926 Nurth Broad Stree:, Phila- 
delphia 21, Pennsylvania. 

Whitton, Howard J. G., B.Sc., (University of London); Senior Mathematician, Trans 
Canada Airlines (O.R. Department) Room 907, Terminal Centre Building, 1060 Uni- 
versity Street, Montreal 3, Quebec, Canada; 2407 Lockhart Avenue, Montreal 16, Quebec, 
Canada. 

Williams, Bryan Mc., A.B., (George Washington University) ; Booz/Allem Applied Research, 
Inc., 4921 Auburn Avenue, Bethesda, Maryland; 4611 River Road, N.W., Washington 
1, D.C., 

Winter, Ralph P., M.Ed., (College of St. Thomas, St. Paul, Minnesota); Graduate Student, 
University of Minnesota, Minneapolis 14; 5712 38th Avenue South, Minneapolis 17, 
Minnesota. 
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INTERNATIONAL ABSTRACTS IN OPERATIONS RESEARCH 


The Operations Research Society of America and the International Federation 
of Operations Research Society announce the publication of a new journal, 
International Abstracts in Operations Research. Herbert P. Galliher (Massachu- 
setts Institute of Technology) will be the Editor. 

The new journal will contain abstracts of articles selected, on the basis of 
their relationship to Operations Research, from a large number of journals. The 
first issue is scheduled to appear in November, 1961. An arn ual subscription 
will cost $12.50 in the United States and Canada, $10 elsewhere. 

Orders for subscriptions should be sent to the Business Manager, Operations 
Research Society of America, Mt. Royal and Guilford Aventies, Baltimore 2, 
Maryland, U.S.A. Editorial communications should be addressed to the Editor, 
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International Abstracts in Operations Research, Room 6-218, Massachusetts 
Institute of Technology, Cambridge 39, Mass., U.S.A. 


a 


FELLOWSHIP AND RESEARCH OPPORTUNITIES 


The Division of Mathematics, National Academy of Sciences—National 
Research Council calls attention to a variety of fellowships and other support 
for basic research in mathematics to be awarded by agencies of the Federal 
Government during the year 1961-62. A list of sources of support is given in 
the bulletin, ““A Selected List of Major Fellowship Opportunities and Publica- 
tions for Educational Support,” available from the Fellowship Office, National 
Academy of Sciences—National Research Council, 2101 Constitution Avenue, 
Washington 25, D. C. 


PRELIMINARY ACTUARIAL EXAMINATIONS 
PRIZE AWARDS ANNOUNCED 


The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of the General Mathematics 
Examination of the 1961 Preliminary Actuarial Examinations are as follows: 


First Prize of $200 


Wells, John C. Massachusetts Institute of Technology 


Additional Prizes of $100 each 
Bender, Edward A. California Institute of Technology 
Butler, William A. Queen’s University 
Corwin, Lawrence J. Harvard University 
Hochster, Melvin Harvard University 
Mather, John N. Harvard University 
Segal, David M. Harvard University 
Waterhouse, William C. Harvard University 
Weiss, Norman J. Harvard University 


The Society of Actuaries has authorized a similar set of nine prizes for 1962. 
The Preliminary Actuarial Examinations consists of two examinations: The 
General Mathematics Examination (based on the first two years of college 
mathematics), and The Probability and Statistics Examination. The 1962 
Preliminary Actuarial Examinations will be prepared by the Educational Testing 
Service under the direction of a committee of actuaries and mathematicians, and 
will be administered by the Society of Actuaries at centers throughout the 
United States and Canada on November 15, 1961 and on May 9, 1962. The 
closing date for applications is April 1, 1962. Further information concerning 
these Examinations can be obtained from the Society of Actuaries, 208 South 
LaSalle Street, Chicago 4, Illinois. 
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DISCOUNT FOR 
THEORY OF PROBABILITY AND ITS APPLICATIONS 


The Society for Industrial and Applied Mathematics has announced that it 
i: extending its discount subscription rate for Theory of Probability and its 
Applications (a cover-to-cover English translation of the Soviet journal, Teoriva 
Veroyatnostei i ee Primeneniya) to the individual members of the organizations 
comprising the Conference Board of the Mathematical Sciences. In particular, 
members of the Institute of Mathematical Statistics may now subscribe at the 
rate of $9.50 a year; subscriptions may be entered beginning with Volume 1 
(1956) or beginning with later volumes. 

Volumes 1 (1956) through 4 (1959) are now complete and available. The 
first three of these are bound in hard covers. All the 1960 issues should be avail- 
able by the end of 1961. 

Subscription orders, and requests for further informatiou, should be sent to 
SIAM, Box 7541, Philadelphia 1, Pennsylvania, U. 8. A. Membership in the 
Institute of Mathematical Statistics should be mentioned. 


a RR 


SYMPOSIUM ON REDUNDANCY TECHNIQUES 
FOR COMPUTING SYSTEMS 


A symposium on Redundancy Techniques for Computing Systems, sponsored 
by the Information Systems Branch, Office of Naval Research, will be held on 
6 and 7 February 1962. This Symposium will be held in the Department of the 
Interior Auditorium, C Street between 18th and 19th Street, N. W., Washington, 
an. 

The objective of the Symposium is to focus attention on new ideas, research, 
and developments which may lead to the introduction of redundancy techniques 
into forthcoming computing systems. It is apparent that many information 
processing systems of the future will be of such size that conventionai methods 
of fabrication and of maintenance will be completely infeasible. Computers for 
space applications have reliability requirements which are presently not achiev- 
able without use of redundancy. Thus, some form of logical redundancy must be 
introduced if practical systems are to be operated. Accordingly, it is planned to 
present in a single symposium a collection of related papers concerning suggested 
new techniques for the use of redundancy in computing systems. This should 
permit relative evaluation of such techniques with respect to their probable 
future utilization in various aspects of the computing field. The program will 
consist of papers invited from many of the organizations engaged in appropriate 
research and development activities. 

Attendance is open to all interested technical personnel. Further information 
and a preliminary Symposium program, when available, may be obtained from 
Miss Josephine Leno, Code 430A, Office of Naval Research, Washington 25, 
D. C., (Phone OXford 6-6213). 





NEWS AND NOTIVCES 


REPORT OF THE PRESIDENT FOR 1961 


The past year has been one of continued growth for the Institute as for the 
statistical profession as a whole. 


Publications 


As announced in the March Annals, the year saw the publication of the first 
two volumes of the Statistical Research Monographs, jointly sponsored by the — 
Institute and the University of Chicago. Another new series, of which the first 
volume was published this year, is the Selected Translations in Mathematical 
Statistics and Probability, which is being published jointly by the Institute and 
the American Mathematical Society. Further volumes in both series are in 
preparation. 

The most central activity of the Institute is probably its publication of the 
Annals of Mathematical Statistics. The Annals was founded in 1930 by Professor 
H. C. Carver, and he edited the journal—which was published first under the 
auspices of the American Statistical Association and later of the IMS—auntil 
1938. In recognition of the great debt we all owe to Professor Carver for this 
farsighted and courageous action which he undertook against great odds and 
discouragement, and as a token of our gratitude to him, the Council is dedicating 
the 1961 volume of the Annals to Professor Carver on the occasion of his retire- 
ment from his Professorship at the University of Michigan. 

The end of the Institute year marks the expiration of the term of office of the 
Editorial Board of the Annals. I wish to express the gratitude of the Institute 
to this board and in particular to the retiring Editor, William Kruskal, for his 
untiring and most successful work during the last three years. At the recom- 
mendation of a committee consisting of L. J. Savage (Chairman), D. R. Cox, 
T. E. Harris and C. M. Stein, the Council asked J. L. Hodges, Jr., to take over 
the Editorship and Professor Hodges has accepted this appointment. 

The Council decided at the Stanford meeting that it would be desirable to 
review the organization of the Annals, and in particular to investigate whether 
the Institute should start a separate probability journal. Much thought was 
given to these problems by a committee consisting of T. E. Harris (Chairman), 
D. Blackwell, J. Daly, J. L. Doob, J. L. Hodges, Jr., W. Kruskal and W. Smith. 
The committee recommended against the establishment of a separate prob- 
ability journal. Regarding the organization of the Annals, it pointed to the 
imperative need to cut down the work load of the Editor, and to that end recom- 
mended that the Editor should be freed of the duties connected with publishing 
(as opposed to editing) the Annals. The Committee also suggested that the 
whole problem of Annals reorganization should be considered again before the 
end of the term of the incoming Editor. 


Meetings 


An important feature of the meetings held this year, a spring meeting of the 
Eastern Region at Cornell and the Annual meeting in Seattle, were a number of 
Special Lectures arranged by the Committee on Special Lectures, consisting of 
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Leo Katz (Chairman), G. E. P. Box, D. Gilford, J. Kiefer, H. Robbins, H. 
Scheffé, and D. Wallace. The 8th Rietz lecture was delivered by Professor 
Blackwell on “Dynamic Programming” and the 4th series of Wald lectures by 
Professor Stein on the topic “Estimation of Many Parameters.” In addition, 
Special Invited Papers were presented by Dr. T. Dalenius, Professor D. Dugué 
and Professor M. Fisz. 

At the Stanford meeting, the Council decided to initiate the holding of Euro- 
pean regional meetings in addition to the Annual meeting and to the meetings 
of the American regions. At the recommendation of a Committee consisting of 
H. Theil (Chairman), J. Durbin, G. Elfving, M. Fisz, R. Fortet, U. Grenander, 
A. Hald, J. Hemelrijk, A. Linder, D. Lindley, L. Schmetterer and E. Sverdrup, 
the Council at the Seattle meeting accepted invitations to hold such meetings 
in the summer of 1962 in Dublin and of 1963 in Copenhagen. 

After an interruption of several years, we hope to resume the Summer Iusti- 
tutes which were so successful in 1955-58. At the recommendation of a committee 
consisting of J. Kiefer (Chairman), H. Chernoff, D. Gilford, J. Pratt and M. 
Sobel, the Council accepted the invitation of Michigan State University to hold 
a Summer Institute there in 1963 on the subject, “Statistical inference in sto- 
chastic processes’; the committee hopes also to make arrangements for such an 
Institute for 1962. 

Membership 


The Committee on Institutional Memberships under the chairmanship of 
Mervin Muller has had another very successful year. President-Elect Bowker 
organized a drive for individual members which also met with considerable 
success. Finally, a European Membership Committee has just started function- 
ing under the chairmanship of Eugene Lukacs. 

It is a great pleasure to congratulate the new Fellows of the Institute: R. 
Bradley, T. Dalenius, C. Dermahn, M. Dwass, D. Gilford, M. V. Johns, Jr., E. 
Parzen, and I. R. Savage. 


Visiting Scientist’s Program 
Because of the critical shortage, in many areas, of qualified statisticians, the 
Council at the Stanford meeting discussed the desirability of a Visiting Scientist’s 
Program in Statistics and Probability, which would bring the nature of the 
field and its potentialities to the attention of prospective students and their 


advisers. A proposal to the National Science Foundation for support of such a 
program has been prepared by a committee consisting of R. Bradley (Chairman), 
D. Blackwell, D. Chapman, J. W. Hamblen, B. Harshbarger, R. Pinkham, L. 
Snell and 8. 8. Wilks. 


The Role of IMS and Its Relationship with Other Societies 


Although the IMS was founded in the United States and most of its activities 
have been centered there, the constitution has never provided for a national 
affiliation. At present, about one fourth of the IMS membership is outside the 
U.S.A., and the proportion has been increasing. 
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With the rapid development of statistics in many parts of the world, there is 
an increasing need for a nonrestrictive and truly international statistical organiza- 
tion, with emphasis on theory. It is possible that, in time, the IMS could develop 
into such an organization. The whole problem of the international role of IMS, 
including in particular the question of whether IMS should affiliate with the 
International Statistical Institute, is being studied by a committee consisting 
of W. G. Cochran ‘Chairman), M. 8S. Bartlett, J. Durbin, D. M. Gilford, U. 
Grenander, A. Hald, J. Neyman, G. E. Nicholson and J. W. Tukey. 

In the U.S.A., the growth of statistics has posed the problem of developing an 
organization that could effectively speak for the statistical community as a 
whole. For this purpose, the American Statistical Association organized a meeting 
of statistical societies, at which our representatives were H. Scheffé and S. 8. 
Wilks. At the meeting, a conference board of statistical societies was discussed, 
similar to the Conference Board of Mathematical Sciences of which IMS is a 
member, and a committee was appointed to draft a constitution for such a 
board. 

A great deal of work of the Institute is done by its committees, and I should 
like to thank the many persons who accepted committee service and worked 


hard to make it a success. I should also like to acknowledge the great personal 


debt I owe to President-Elect Albert H. Bowker and Program Coordinator 
Dorothy Gilford for their invaluable advice and assistance throughout the year. 
In conclusion I should like to announce the Nominating Committee for next 
year, which consists of H. Raiffa (Chairman), H. E. Daniels, P. Meier, D. 
Owen, R. Pyke, C. R. Rao, A. Rényi, I. R. Savage, and M. Zelen. 

EF. L. LaHMann 


———————————— 


IMS OFFICERS, COMMITTEES, AND REPRESENTATIVES—1961 


Council Members and Officers 
Terms Expire 1961 Terms Expire 1962 
F. J. Anscombe T. W. Anderson 
T. E. Harris Z. W. Birnbaum 
Leo Eatz J. L. Hodges, Jr. 
8. 8. Wilks Wassily Hoeffding 
Terms Expire 1963 Terms Expire 1964 
Herman Chernoff R. L. Anderson 
Kai-Lai Chung D. Blackwell 
M. G. Kendall W. Kruskal 
Gerald J. Lieberman 8S. N. Roy 
Charles Stein 
One Year Terms 
Joseph Berkson 
David L. Wallace 
President: EK. L. Lehmann Program Coordinator: D. H. Gilford 
President-Elect: A. H. Bowker Associate Secretaries 
Secretary: G. E. Nicholson, Jr. Central: D. Burkholder 
Treasurer: Gerald Lieberman Eastern: Joan Rosenblatt 
Editor: W. H. Kruskal Western: Fred Andrews 


ee 





NEWS AND NOTICES 


COMMITTEES—1961 
(The first person named is the chairman) 
COMMITTEE ON ANNALS INDEX 
I. R. Savage, T. E. Harris, J. L. Hodges, Jr., W. H. Kruskal, G. E. Nicholson, Jr. 


BROCHURE COMMITTEE 


E. Parzen, D. Blackwell, A. H. Bowker, J. F. Daly, B. G. Greenberg, M. H. Hansen, 
G. E. Nicholson, Jr., 8. S. Wilks 


COMMITTEE ON EXCHANGES 


Work of committee now performed by G. E. Nicholson, Jr. 


COMMITTEE ON FELLOWS 
H. Solomon (1961), E. J. G. Pitman (1961), W. G. Cochran (1962), J. L. Hodges, Jr. 
(1962), G. Elfving (1963), I. Olkin (1963) 
COMMITTEE ON FINANCE 


H. Levene, K. J. Arnold, K. A. Brownlee, J. Curtiss, G. Liebermann 


COMMITTEE CN INSTITUTIONAL MEMBERS 
M. E. Muller, K. J. Arnold, E. L. Crow, M. Dwass, E. Lukacs, P. D. Minton, 
R. B. Murphy, E. Seiden 


COMMITTEE ON MATHEMATICAL TABLES 


D. B. Owen, R. L. Anderson, A. H. Bowker, P. C. Cox, W. J. Dixon, C. Eisenhart, 
J. A. Greenwood, 8S. 8. Gupta, H. O. Hartley, H. L. Harter, L. Katz, F. C. 
Leone, M. E. Muller, P. Olmstead, G. P. Steck, M. A. Woodbury, M. Zelen 


COMMITTEE ON MEMBERSHIP 
A. H. Bowker 
NOMINATING COMMITTEE—1962 ELECTION 


Howard Raiffa, H. E. Daniels, P. Meier, D. B. Owen, R. Pyke, C. R. Rao, 
A. Rényi, I. R. Savage, M. Zelen 


COMMITTEE ON PROFESSIONAL STANDARDS 


J. Lev, R. W. Burgess, C. Eisenhart, G. M. Harrington, B. Kimball, 
A. W. Kimball, H. Marshall, R. Patton, J. Walsh 


PROGRAM COMMITTEE FOR ANNUAL MEETING 


D. Wallace, F. Anscombe (Vice-Chairman), J. Blum, J. L. Folks, R. Gnanadesikan, 
A. T. James, R. Mickey, R. Miller, R. Radner, G. Watson, G. E. Nicholson, Jr., 
D. M. Gilford (ex officio) 


PROGRAM COMMITTEE FOR CENTRAL REGIONAL MEETING 


R. A. Wijsman, J. Berkson, G. E. P. Box, V. R. Hogg, M. Katz, Jr., P. Minton, 
H. Rubin, M. Sobel, O. Wesler, D. R. Whitney 





NEWS AND NOTICES 


PROGRAM COMMITTEE FOR EASTERN REGIONAL MEETING 

Mervin E. Muller, R. Bechhofer, R. A. Bradley, C. Daniel, A. P. Dempster, 

S. W. Greenhouse, M. Kastenbaum, E. Paulson, W. L. Smith, 

H. Teicher, R. Wormleighton, M. Zelen 
PROGRAM COMMITTEE FOR EUROPEAN MEETING 
H. Theil (Rotterdam), G. Elfving (Stanford), R. Fortet (Paris), M. Fisz (Seattle), U. 
Grenander (Stockholm), A. Hald (Copenhagen), J. Hemelrijk (Amsterdam), D. V 
Lindley (Cardiganshire, Wales), A. 


Linder (Geneva), L. 
burg), E. Sverdrup (Oslo) 


Schmetterer (Ham- 


COMMITTEE ON THE INTERNATIONAL ROLE OF IMS 
W. Cochran, M. 8. Bartlett, J. Durbin, D. Gilford, U. Grenander, 
A. Hald, J. Neyman, G. Nicholson, J. W. Tukey 


COMMITTEE TO SELECT A NEW EDITOR 
L. J. Savage, D. Cox, T. Harris, C. Stein 
COMMITTEE ON SPECIAL PAPERS 
L. Katz, G. E. P. Box, J. Kiefer, I. Olkin, H. Robbins, H. Scheffé, 
D. Gilford (ex officio), D. Wallace (ex officio) 

COMMITTEE ON SUBSCRIPTIONS 
E. P. Coleman, J. K. Adams, C. B. Bell, K. A. Bush, L. R. Elveback, H. Harmon 
COMMITTEE ON ANNALS REORGANIZATION 


Harris, D. Blackwell, J. F. Daly, J. L. Doob, J. L. Hodges, Jr., W. H. Kruskal, 
W. Smith 


= 


COMMITTEE TO REEXAMINE IMS FROM VIEWPOINT OF 
YOUNGER MEMBERS 
W. L. Smith, P. Billingsley, 8S. H. Brooks, B. W. Brown, Jr., D. L. Burkholder, M. B 


Danford, M. H. DeGroot, A. P. Dempster, D. A. Gardiner, W. J. Hall, J. W. Hamblen, 
J. E. Jackson, A. Madansky, H. E. McKean, R. Pyke, D. L. 


Wallace, O. 
Wesler, R. A. Wijsman 


COMMITTEE ON RUSSIAN TRANSLATIONS 


W. Hoeffding, M. Dwass, E. Lukacs, I. Olkin, L. Schmetterer, W. Kruskal (ex officio) 


COMMITTEE TO ORGANIZE A SUMMER INSTITUTE 
J. Kiefer, H. Chernoff, J. Pratt, M. Sobel 
COMMITTEE ON THE VISITING SCIENTISTS PROGRAM 


R. Bradley, D. Blackwell, D. Chapman, J. W. Hamblen, B. Harshbarger, 
R. Pinkham, L. Snell, S. S. Wilks 


COMMITTEE ON THE RELATIONSHIPS AMONG STATISTICAL SOCIETIES 
H. Scheffé, 8. S. Wilks 
REPRESENTATIVES TO PROFESSIONAL ASSOCIATIONS—1961 
IMS REPRESENTATIVE TO AAAS—Harold Hotelling 


AMERICAN STANDARDS ASSOCIATION COMMITTEE ON STATISTICAL 
NOMENCLATURE: IMS REPRESENTATIVE—P. G. Hoel 
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IMS REPRESENTATIVE TO CONFERENCE ORGANIZATION OF THE 
MATHEMATICAL SCIENCES—W. M. Rosenblatt, Z. W. Birnbaum 
IMS REPRESENTATIVE IN DIVISION OF MATHEMATICS—NATIONAL 
RESEARCH COUNCIL—F. C. Mosteller 
IMS REPRESENTATIVES ON JOINT ASA-IMS BROCHURE COMMITTEE 
8. 8. Wilks, D. Blackwell, B. G. Greenberg, W. H. Kruskal 
IMS REPRESENTATIVES TO AMS-IMS COMMITTEE ON RUSSIAN 
TRANSLATIONS—W. Hoeffding, I. Olkin 


REPORT OF THE EDITOR FOR 1961 


During the operating year August 1, 1960 to July 31, 1961, there were 225 
manuscripts submitted to the Annals. Final decisions were made for 219 manu- 
scripts during the 1960-61 operating year, and 185 manuscripts were under 
editorial consideration on July 31, 1961. 

A detailed statistical report of Annals operations in 1960-61 will be sent to 
interested members on request. 

William Kruskal’s term as Editor ended on June 30, 1961; J. L. Hodges, Jr. 
began his editorship on July 1, 1961. Kruskal will, however, continue to be 
responsible for editorial decisions about manuscripts submitted on or before 
June 30, 1961, and still under consideration at that date. 

It is a pleasure to express renewed gratitude to the Associate Editors for their 
difficult and effective work. Mrs. Robert Isherwood, Editorial Assistant, has 
devoted herself to the complex task of managing and carrying out the many 
tasks that arise in publishing the Annals; we thank her deeply. The Universities 
of Chicago and California have continued to provide material aid. 

Finally, we have the pleasure of listing the names of referees of papers for which 
final editorial decisions have been made during the period from October 1960 
and September 1961 inclusive. Their work is essential in carrying out our editorial 
procedures, and we express to them our great appreciation. 

Wituiam Kruskat, Past Editor 
J. L. Hopegs, Jr., Editor 


Addelman, Sidney 
Anderson, R. L. 
Anderson, T. W. 
Anscombe, F. J. 
Armitage, P. 
Arrow, Kenneth 
Bahadur, R. R. 
Balakrishnan, A. V. 
Barlow, Richard E. 
Baxter, Glen 


Bharucha-Reid, A. T. 


Billingsley, Patrick 
Birnbaum, Allan 


Blackwell, David 
Blum, Julius R. 


Blumenthal, R. M. 


Bofinger, V. 
Borges, Rudolf E. 
Bose, R. C. 

Box, G. E. P. 
Brillinger, David 
Brunk, H. D. 


Buehler, Robert J. 


Burkholder, D. L. 
Chaing, C. L. 


Chakravarti, I. M. 


Chapman, D. G. 
Cogburn, Robert 
Connor, William 8S. 
Craswell, Keith 
Daley, Joseph FP. 
Daniels, H. E. 
Darling, Donald A. 
Dean, B. V. 
DeGroot, Morris 
Dempster, A. P. 
Derman, Cyrus 
Dhruvarajan, P. 8. 
Dixon, W. J. 
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Duncan, David B. 
Dwass, Meyer 
Eicker, Friedhelm 
Epstein, Ben 

Fisz, M. 

Folks, J. Leroy 
Foster, F. G. 
Fraser, D. A. 8. 
Ghurye, 8. G. 
Gnanadesikan, R. 
Good, I. J. 
Goodman, Leo A. 
Graybill, F. 
Grenander, Ulf 
Hadley, George 
Hall, W. J. 
Halperin, Max 
Hammersley, J. M. 
Harter, H. Leon 
Hastings, W. Keith 
Herbach, Leon 
Hill, Bruce M. 
Hoeffding, Wassily 
Hogg, Robert V. 
Hook, Robert 
Hotelling, Harold 
James, A. T. 
Jeeves, J. A. 
Johnson, N. L. 
Jones, Richard 
Kallianpur, Gopinath 
Katz, Melvin L. 


Kemperman, J. H. B. 


Kempthorne, Oscar 
Kiefer, Jack 
Kruskal, Martin D. 
Kullback, 8. 

Laha, R. G. 
‘Lamperti, Jehn 
Laurent, André 
LeCam, Lucien 
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Lehmann, E. L. 
Lieberman, Gerald 
Lindley, Dennis V. 
Lloyd, 8S. P. 
Loéve, Michel 
Lukaes, Eugene 
Madansky, A. 
Mallows, Colin L. 
Marsaglia, G. 
Marshall, Albert W. 
Mason, D. D. 
Massey, Frank, Jr. 
Mesner, Dale 
Meyer, Paul 
Miller, Rupert G. 
Moses, Lincoln 
Mulholland, H. P. 
McCarthy, Philip 
McKean, Harlley 
Nash, Stanley 
Noether, Gottfried 
Olkin, Ingram 
Owen, Donald B. 
Parzen, Emanuel 
Patel, M. S. 
Paulik, Gerald 
Paulson, E. 
Plackett, R. L. 
Pratt, John 
Proschan, Frank 
Pyke, Ronald 
Rényi, A. 

Richter, Donald 
Robbins, Herbert 
Robson, D. 8. 
Rosenblatt, Murray 
Roy, 8. N. 

Ruben, Harold 
Ruben, Herman 
Rustagi, J. 8S. 
Saunders, 8. 


Savage, I. R. 
Savage, L. J. 
Schaufele, Ronald 
Scheffé, Henry 
Schwarz, Gideon 
Shah, B. V. 
Siegert, Arnold 
Smith, Walter 
Snell, J. Laurie 
Sobel, Milton 
Spitzer, Frank 
Sprott, David A. 
Steck, G. P. 
Stein, Charles M. 
Takdes, L. 

Tate, R. F. 
Teicher, Henry 
Teichroew, D. 
Thomasian, Aram J. 
Thompson, Donovan 
Thompson, W. A., Jr. 
Trotter, Hale F. 
Truax, Donald R. 
Tucker, Howard 
Tukey, John W. 
Tweedie, M. C. K. 
Ury, Hans 
Wallace, David L. 
Watson, G. 8. 
Weiss, Lionel 
Welch, B. L. 
Wendel, James 
Wesler, Oscar 
Whittle, Peter 
Wijsman, Robert 
Wilk, M. B. 
Williams, J. 
Wolfowitz, J. 
Yivisaker, Donald 
Zelen, Marvin 


DOCTORAL DISSERTATIONS IN STATISTICS, 1960 


The following dissertation titles were received too late for inclusion in 


the September 1961 Annals. 


Ed. 


Martin Arthur Arkowitz, Cornell University, major in geometry, algebra, analysis, ‘‘The 
Generalized Whitehead Product.”’ 
K. M. Ferrin, University of California, Los Angeles, major in statistics, “Multiple Decision 
Procedures for Normal Populations.’’ 
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Yoichiro Fukuda, University of California, Los Angeles, major in statistics, ‘‘Estimation 
Problems in Inventory Control.”’ 

R. I. Jennrich, University of California, Los Angeles, major in statistics, ‘“‘Analysis of 
Variance in the General Mixed Model.” 

Alan Gustave Koneheim, Cornell University, major in analysis, algebra, applied mathe- 
matics, ‘‘Some Properties of a Class of Finite Trigonometric Sums.”’ 

Samuel Kotz, Cornell University, major in analysis, algebra, geometry, ‘Exponential 
Bounds for the Probability of Error in Discrete Memoryless Channels.” 

E. M. Scheuer, University of California, Los Angeles, major in statistics, ‘Simultaneous 
Estimation for Means of Medians of Dependent Random Variables without Distribu- 
tion Assumptions.”’ 


a 


PUBLICATIONS RECEIVED 


Anuario Estadistico de Espafia, 1961 Edicion Manual, Presidencia del Gobierno, Instituto 
Nacional de Estadistica, Ferraz 41, Madrid, Spain, 1001 pp. 

Bedkenbach, Edwin and Richard Bellman, An Introduction to Inequalities, Random House, 
New York, 1961, paperback, 133 pp. $1.95. 

Davis, Philip J., The Lore of Large Numbers, Random House, New York, 1961, paperback, 
165 pp. $1.95. 

Friend, J. Newton, More Numbers: Fun and Facts, Charles Scribner’s Sons, New York, 
1961, 201 pp. $2.95. 

Gersdorff, Ralph von, Portugals Finanzen: Geschichtlicher Uberblick Die Finanzreformen 
Prof Salazars Steuer-und Staatsschuldenwesen, Verlag Ernst und Werner Gieseking, 
Bielffeld, 1961, 280 pp. 

Hidano, Naoru and Masatoshi Sedani, Statistics fur Psychology and Education (in Japanese), 
Baifukan Co., Tokyo, Japan, 1961, 346 pp. 

‘Investing in Scientific Progress, 1961-1970: Concepts, goals, and projections,’’ National 
Science Foundation, Washington, D. C., 1961, 30 pp. 

Kazarinoff, Nicholas D., Geometric Inequalities, Random House, New York, 1961, paper- 
back, 132 pp. $1.95. 

Mosteller, Frederick, Robert E. K. Rourke, and George B. Thomas, Jr., Probability: A 
First Course, Addison-Wesley Pub. Co., Reading, Mass., 1961, 319 pp. $5.00. 

Mosteller, Frederick, Robert E. K. Rourke, and George B. Thomas, Jr., Probability With 
Statistical Applications, Addison-Wesley Pub. Co., Reading, Mass., 1961, 478 pp. $6.50. 

Murray, Francis J., Mathematical Machines, Vol. 1: Digital Computers, Columbia University 
Press, New York, 1961, 300 pp. $12.50. 

Murray, Francis J., Mathematical Machines, Vol. II: Analog Devices, Columbia University 
Press, New York, 1961, 365 pp. $17.50. 

Niven, Ivan, Numbers: Rational and Irrational, Random House, New York, 1961, paper- 
back, 136 pp. $1.95. 

Rasch, G., Probabilistic Models for Some Intelligence and Attainment Tests, Studies in 
Mathematical Psychology I, Danmarks Paedagogiske Institut, Denmark, 1960. 

Salkind, Charles T., Ed., The Contest Problem Book, Random House, New York, 1961, 
paperback, 152 pp. $1.95. 

Sawyer, W. W., What is Calculus About? Random House, New York, 1961, paperback, 118 
pp. $1.95. 








ADVERTISING IN 


THE ANNALS of 
MATHEMATICAL STATISTICS 


ADVERTISEMENTS for books, recruitment of professional 
personnel, etc., may now be placed in the Annals of 
Mathematical Statistics. Only full-page and half-page 
advertisements will be accepted. For details about 
costs, deadlines, sizes, and so on, please write to 


Mr. Edgar M. Bisgyer 
Advertising Manager 
American Statistical Assn. 
1757 K Street, N.W. 
Washington 6, D. C. 





DECISION AND 
THE RANDOM CHANNEL 


For some time now a firm mathematical discipline, Statistical 
Decision Theory, has been applied to optimize signal reception 
in many important communication situations. Its most con- 
spicuous success has been in the restricted problem of the 
reception of a (not too large) number of exactly known signals 
corrupted by additive independent gaussian noise with a known 
covariance function. Optimum reception in this case may be 
obtained through use of a correlation (or matched filter) receiver 
which uses replicas of the known signals stored either as wave- 
forms or as the impulse responses of linear, time-invariant filters. 
When faced with the lack of specific knowledge of channel 
parameters, decision theory is less authoritative in specifying 
the form of the receiver. In this more difficult case, it is some- 
times plausible to use an adaptive approach and estimate the 
possible waveforms which could exist at the receiver before 
noise is added, and to crosscorrelate the noisy received wave- 
forms against these estimates. In fact, the application of deci- 
sion theory to a channel which can be described by a muiti- 
dimensional gaussian process with a zero mean and known 
covariance function, leads specifically to an adaptive receiver 
which generates reference waveforms from such estimates. 
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Generalized model of u communication channel. 


At the Philco Research Center, scientists are continuing to 
study adaptive techniques. Typical programs include 1.) the 
problems of communicating over channels whose statistics may 
be both poorly known and changing; and 2.) investigation of 
non-parametric methods of statistical inference which are not 
greatly sensitive to unknown parameters. 


The Research Division offers attractive staff positions to scien- 
tists in a number of areas in addition to communication theory. 
Electrodynamics, plasma physics, ultramicrowaves, opticai com- 
munications, and the physics of materials are among the areas 
of current interest. Letters of application or inquiry may be 
addressed to: 
Mr. W. M. Diefenbach 
Personnel Director 
Philco Research Center 
Blue Bell, Penna. 


RESEARCH CENTER 


PHILCO 


=| Famous for Quality the Uerld Over® 


An equal opportunity employer 





for placement @ 
Or Mj personnel 


data processing * mathematical 
sciences * operations research 
all levels * coast to coast 
contact us in confidence 
HERBERT HALBRECHT 
332 south michigan ave ° 
chicago four, illinois 
suite 540 harrison 7-2876 


MANAGEMENT COUNSEL 
EXECUTIVE RECRUITMENT 





HANDBOOK OF STATISTICAL TABLES 

By Dona.p B. Owen, Sandia Cerporation 

Offers an unusually complete collection of tables of functions used in 
statistics. Extremely helpful to statisticians, quality control and indus- 
trial engineers. Some 100 different tables are included. 


c. 576 pp, 1962—$10.00 
MATHEMATICAL PROGRAMMING 
By 8. Vaspa, British Admiralty Research Laboratory, Teddington 
This well-organized book offers a large number of fine worked-out ex- 
amples and problems. Early chapters offer an exhaustive treatment of 
the theoretical foundations, always with the use of fairly elementary 
mathematics. Later chapters guide the reader to the rapidly expanding 
frontiers of the subject. 310 pp, 77 illus, 1961—$8.50 


INDEX OF MATHEMATICAL TABLES—Second Edition 


By A. FLetcHer anp J. C. P. Mitier, L. RoseNHEAD AND L. J. Comrig, 
Scientific Computing Service, Ltd. 

Indexes the contents of all important mathematical tables published 
since Briggs’ Logarithmorum Chilias Prime. (1617). Some previously un- 
published tables included. 2 Vols. Dec. 1961—appror. $42.00 


....... New from Addison-Wesley! 
Examination copies gladly provided. 


Write: 519 South Street 
Reading, Massachusetts BEE 





Guide to Tables in 


Mathematical! Statistics 
By J. R. GREENWOOD and H. 0. HARTLEY 


This Guide, the most comprehensive compilation available, is the 
product of more than twenty years’ work under the sponsorship 
of the National Research Council, produced by and for the 
Committee on Mathematical Tables and Aids to Computation. 
The Committee believes that this monumental work will be a 
great service to the mathematical and statistical community, 
that scientists and technologists in many fields will find this 
Guide a valuable index to the widely scattered tables of prob- 
ability and statistics needed in their work. 1076 pages. $8.50 

Order from your bookstore, or 
PRINCETON UNIVERSITY PRESS 
Princeton, New Jersey 














INTERNATIONAL JOURNAL OF ABSTRACTS 


STATISTICAL THEORY AND METHOD 
A Journal of the International Statistical Institute 


The aim of this journal of abstracts is to give complete coverage of published papers in the field of statistical theory 
(including associated aspects of probability and other mathematical methods) and new published contributions to 
statistical method. 

All contributions in the following five journals—being wholly devoted to this field—are abstracted: Annals o 
Mathematical Statistics; Biometrika; Journal, Royal Statistical Society (Series B); Bulletin of Mathematical Statistics; 
Annals, Institute of Statistical Mathematics; and a further group of six journals are abstracted on a virtually complete 
basis as follows: Biometrics; Metrika; Metron; Review, International Statistical Institute; Technometrics; Sankhy4. There 
are about 250 other journals partly devoted to statistical theory and method from which the appropriate papers are 
abstracted. 

The abstracts are about 400 words long—the recommendation of UNESCO for the “long”’ abstract service: they 
are in the English language although the original language of the paper is noted on the abstract together with the 
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